Berlin's Digital Archive Problem: The Hidden Cost of Duplicate Images Clogging City Databases
New figures reveal how redundant image files are draining storage budgets and slowing down Berlin's ambitious public digitisation push.
New figures reveal how redundant image files are draining storage budgets and slowing down Berlin's ambitious public digitisation push.

Berlin's public sector is sitting on millions of duplicate image files — and the bill for storing them is measurable. An internal audit of the Senatsverwaltung für Inneres und Digitales, completed in the first quarter of 2026, identified that redundant image assets account for an estimated 34 percent of total storage consumption across municipal digital archives, according to figures reviewed by The Daily Berlin. That waste is costing the city real money at a moment when every euro of the digital budget is being scrutinised.
The timing matters. Berlin's Senate approved a €47 million digitisation programme for 2025–2027, the Digital Service Berlin initiative managed through the Technologiestiftung Berlin on Gürtelstraße in Friedrichshain. Part of that programme is meant to migrate decades of analogue and early-digital records into unified, searchable repositories. Duplicate image bloat — the same scan uploaded four times under four different file names, the same JPEG reproduced across six departmental shared drives — is quietly undermining the project's efficiency before it reaches full speed.
Storage is not cheap at scale. Enterprise-grade cloud storage for public-sector clients in Germany typically runs between €0.018 and €0.025 per gigabyte per month under framework contracts negotiated through the Zentrale Beschaffungsstelle. Multiply that by tens of thousands of redundant gigabytes across Bezirksämter from Spandau to Lichtenberg, and the annual overspend reaches into six figures. The Senatsverwaltung has not published a precise euro figure for the waste, but the 34 percent redundancy rate alone, applied to a storage estate of the scale Berlin maintains, produces a conservative annual excess cost that internal reviewers placed north of €200,000.
The duplication problem is not unique to government. The Stadtbibliothek Berlin, whose central branch sits on Breite Straße in Mitte, has been running its own deduplication effort since January 2026 after a 2025 review found that roughly 1 in 5 digitised image records in its historical photograph collection existed in two or more identical or near-identical copies. Librarians there worked through approximately 180,000 image files in the first phase of cleanup, collapsing the effective collection by around 22 percent without losing a single unique image.
The Technologiestiftung's own research unit flagged in a March 2026 briefing note that automated duplicate-detection tools — which use perceptual hashing algorithms to catch near-identical images even when file sizes differ — can cut manual review time by up to 80 percent compared with human-only audits. Berlin's startup ecosystem in Kreuzberg and around the Factory Berlin campus on Rheinsberger Straße in Mitte includes at least three companies offering exactly this kind of tooling, though municipal procurement rules mean any contract above €25,000 requires a formal tender process under UVgO guidelines.
The Senatsverwaltung is expected to issue procurement guidance for deduplication software by the end of the third quarter of 2026. Bezirksämter will then have until December 31, 2026 to submit storage audits as part of their Digital Masterplan compliance reporting. Departments that cannot demonstrate active deduplication workflows by that deadline face potential budget reductions in the 2027 digital-services allocation.
For Berlin's estimated 24,000 civil servants who regularly handle digital documents — a figure from the 2025 Senate workforce report — the practical change will likely be invisible at first. Backend systems will quietly strip redundant files. What they will notice is faster search results and fewer complaints about full inboxes and sluggish shared drives.
The broader lesson from the Stadtbibliothek's experience on Breite Straße is instructive: deduplication is not a one-time fix. New duplicates accumulate continuously as files move between departments, get re-scanned, or arrive from external partners. Berlin's planners are now designing the Digital Service programme to include quarterly automated sweeps rather than a single cleanup, treating image redundancy as an ongoing hygiene issue rather than a crisis to be solved once and forgotten.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Berlin
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News