Berlin's public sector holds an estimated 40 million digital image files across its various administrative databases — and a significant share of them are exact or near-exact duplicates. That is the working figure being circulated inside the Berlin Senate Department for Digital Transformation as officials prepare a city-wide audit, scheduled to begin in the third quarter of 2026, targeting redundant data stored across 78 separate municipal IT systems.
The push matters now because storage is no longer cheap abstraction. Berlin's Senatsverwaltung für Digitales signed a framework contract in early 2025 for expanded cloud infrastructure, and internal budget documents reviewed by procurement watchers place ongoing annual storage costs for municipal data at above €14 million. When analysts inside the department began tagging image repositories last autumn, they found duplication rates in some archives running as high as 34 percent — meaning roughly one in three image files was already stored elsewhere in the system under a different filename or timestamp.
What Duplication Actually Costs
Duplicate images are not a vanity problem. Each redundant file consumes server capacity, slows search retrieval, and — in the case of citizen-facing portals — inflates page-load times and degrades accessibility scores. The Landesarchiv Berlin, located on Eichborndamm in the Reinickendorf district, digitised roughly 1.2 million historical photographic prints between 2018 and 2024 as part of its ongoing preservation mandate. Staff there have identified that a subset of those scans, particularly images from the post-war Wiederaufbau period, were ingested multiple times by different project teams working independently, producing duplicate clusters that now require manual review.
The problem is not unique to legacy institutions. Technologiestiftung Berlin, the nonprofit that tracks the city's digital infrastructure development, published a sector review in March 2026 noting that Berlin's growing pool of civic-tech and GovTech startups — many of them clustered around the Factory Berlin campus on Rheinsberger Straße in Mitte — frequently integrate with city data APIs and inadvertently mirror image assets locally, compounding the redundancy at the city end. The foundation estimated that unmanaged duplicate data across Berlin's public-facing digital services added the equivalent of several hundred terabytes of unnecessary overhead annually, though it cautioned the figure was a modelled range rather than a direct audit result.
Automated deduplication tools have existed for years, but adoption inside Berlin's bureaucracy has been patchy. The city's IT service provider, ITDZ Berlin, which operates the central government network from its data centre in Tempelhof, introduced a deduplication layer on its primary object storage system in 2023. But that layer covers only systems directly hosted by ITDZ — not the dozens of departmental servers still running in distributed configurations across borough offices from Spandau to Lichtenberg.
What the Audit Is Expected to Find
The upcoming audit, being coordinated by the Berlin Senate's Chief Digital Officer directorate, will use perceptual hashing — a technique that identifies visually similar images even when file names and metadata differ — across a combined dataset drawn from the Senatsverwaltung für Stadtentwicklung, the BVG's public communication archive, and the city's official media library at berlin.de. A pilot run on roughly 800,000 files from the BVG press archive alone returned a duplication rate above 28 percent, according to the project outline circulated to stakeholder departments in May 2026.
For organisations managing their own image libraries outside the municipal umbrella — cultural venues along Karl-Marx-Allee, community media projects in Neukölln, or the dozens of co-working hubs feeding Berlin's startup economy — the city audit serves as a practical prompt. Deduplication software licences for mid-sized organisations typically run between €200 and €1,500 annually depending on archive size, and open-source alternatives including dupeGuru and digiKam are available at no cost. The audit results, expected to be published in summary form by the Senatsverwaltung in late 2026, should give Berlin's digital managers the clearest picture yet of what it actually costs to let redundant data accumulate unchecked.