Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Costly Story

City agencies, startups, and cultural institutions are sitting on millions of redundant image files, and a quiet data-cleanup push is exposing just how bad the problem has become.

By Berlin News Desk · Published 4 July 2026, 9:16 pm

3 min read

Berlin's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Costly Story
Photo: Photo by Naro K on Pexels
Wird übersetzt…

Berlin's public sector holds an estimated 40 million digital image files across its various administrative databases — and a significant share of them are exact or near-exact duplicates. That is the working figure being circulated inside the Berlin Senate Department for Digital Transformation as officials prepare a city-wide audit, scheduled to begin in the third quarter of 2026, targeting redundant data stored across 78 separate municipal IT systems.

The push matters now because storage is no longer cheap abstraction. Berlin's Senatsverwaltung für Digitales signed a framework contract in early 2025 for expanded cloud infrastructure, and internal budget documents reviewed by procurement watchers place ongoing annual storage costs for municipal data at above €14 million. When analysts inside the department began tagging image repositories last autumn, they found duplication rates in some archives running as high as 34 percent — meaning roughly one in three image files was already stored elsewhere in the system under a different filename or timestamp.

What Duplication Actually Costs

Duplicate images are not a vanity problem. Each redundant file consumes server capacity, slows search retrieval, and — in the case of citizen-facing portals — inflates page-load times and degrades accessibility scores. The Landesarchiv Berlin, located on Eichborndamm in the Reinickendorf district, digitised roughly 1.2 million historical photographic prints between 2018 and 2024 as part of its ongoing preservation mandate. Staff there have identified that a subset of those scans, particularly images from the post-war Wiederaufbau period, were ingested multiple times by different project teams working independently, producing duplicate clusters that now require manual review.

The problem is not unique to legacy institutions. Technologiestiftung Berlin, the nonprofit that tracks the city's digital infrastructure development, published a sector review in March 2026 noting that Berlin's growing pool of civic-tech and GovTech startups — many of them clustered around the Factory Berlin campus on Rheinsberger Straße in Mitte — frequently integrate with city data APIs and inadvertently mirror image assets locally, compounding the redundancy at the city end. The foundation estimated that unmanaged duplicate data across Berlin's public-facing digital services added the equivalent of several hundred terabytes of unnecessary overhead annually, though it cautioned the figure was a modelled range rather than a direct audit result.

Automated deduplication tools have existed for years, but adoption inside Berlin's bureaucracy has been patchy. The city's IT service provider, ITDZ Berlin, which operates the central government network from its data centre in Tempelhof, introduced a deduplication layer on its primary object storage system in 2023. But that layer covers only systems directly hosted by ITDZ — not the dozens of departmental servers still running in distributed configurations across borough offices from Spandau to Lichtenberg.

What the Audit Is Expected to Find

The upcoming audit, being coordinated by the Berlin Senate's Chief Digital Officer directorate, will use perceptual hashing — a technique that identifies visually similar images even when file names and metadata differ — across a combined dataset drawn from the Senatsverwaltung für Stadtentwicklung, the BVG's public communication archive, and the city's official media library at berlin.de. A pilot run on roughly 800,000 files from the BVG press archive alone returned a duplication rate above 28 percent, according to the project outline circulated to stakeholder departments in May 2026.

For organisations managing their own image libraries outside the municipal umbrella — cultural venues along Karl-Marx-Allee, community media projects in Neukölln, or the dozens of co-working hubs feeding Berlin's startup economy — the city audit serves as a practical prompt. Deduplication software licences for mid-sized organisations typically run between €200 and €1,500 annually depending on archive size, and open-source alternatives including dupeGuru and digiKam are available at no cost. The audit results, expected to be published in summary form by the Senatsverwaltung in late 2026, should give Berlin's digital managers the clearest picture yet of what it actually costs to let redundant data accumulate unchecked.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.