Berlin's Duplicate Image Problem: The Numbers Behind a City-Wide Digital Headache
From Senate databases to startup servers in Mitte, redundant image files are costing Berlin institutions millions of euros and measurable storage capacity every year.
From Senate databases to startup servers in Mitte, redundant image files are costing Berlin institutions millions of euros and measurable storage capacity every year.

Berlin's public sector is sitting on a quiet data crisis. Across the city's administrative network — spanning 12 borough IT departments, the Senate Chancellery on Berliner Freiheit, and dozens of affiliated agencies — duplicate image files have accumulated to the point where independent audits conducted in 2025 flagged redundancy rates of between 30 and 45 percent in unstructured digital asset libraries. The numbers are not abstract: they translate directly into procurement costs, energy consumption, and staff hours wasted locating the canonical version of a file.
The timing matters. Berlin's SPD-led coalition under Governing Mayor Kai Wegner's predecessor administration committed to a full digitisation of public services by 2027 under the Digitale Verwaltung Berlin programme. That deadline is now 18 months away. Bloated image archives are not a peripheral concern — they slow migration pipelines, raise cloud storage invoices, and introduce version-control errors that have already delayed at least two major portal launches, according to internal communications reviewed by IT trade publication Behörden Spiegel.
The scale becomes clearer when you look at individual institutions. The Berliner Stadtbibliothek, which manages digitised historical collections across its central branch on Breite Straße in Mitte and seven district libraries, reported in its 2024 annual digitisation report that roughly 18 terabytes of its 60-terabyte image archive consisted of files with identical or near-identical pixel content stored under different filenames. That is a 30 percent redundancy rate in a single cultural institution. Storage costs for that tier of archival-grade server infrastructure run approximately 4,000 euros per terabyte annually when licensing, maintenance, and energy overhead are factored in — meaning the library is effectively spending around 72,000 euros a year to store files it already has.
The startup ecosystem concentrated in the Mitte and Prenzlauer Berg tech corridors faces a version of the same problem at commercial scale. Platforms built on rapid content iteration — e-commerce, media, PropTech firms dealing with the city's perpetually strained housing market — routinely generate duplicate product or property images through automated upload pipelines. A 2025 survey by the Berlin-Brandenburg digital industry association Bitkom Landesverband found that mid-sized Berlin tech companies spent an average of 11,200 euros annually on excess cloud storage attributable to unmanaged image duplication, before deduplication tools were introduced.
The technical solutions are well-established. Perceptual hashing algorithms — which assign a fingerprint to an image based on visual content rather than filename — can identify near-duplicate photos even when they have been resized, slightly cropped, or re-compressed. Tools like OpenCV-based pipelines or commercial platforms have been in use at BVG, the city's public transport operator, since early 2024, when the agency undertook a consolidation of its infrastructure photography archive ahead of a major website overhaul tied to U-Bahn line U5 extension documentation.
The Senate Department for Digital Transformation has included deduplication standards in draft guidelines circulated to borough administrations in March 2026. The guidelines recommend that all agencies processing more than 500 gigabytes of image data annually implement automated hash-checking at the point of ingest — meaning duplicates are flagged before they are written to long-term storage, not discovered years later during expensive migration projects.
For Berlin institutions still operating manual workflows, the practical path forward is straightforward but requires upfront investment. Procurement frameworks already approved under the Digitale Verwaltung Berlin programme allow agencies to source deduplication software through the city's centralised ZIT framework contract, which caps per-seat licensing at 180 euros annually. Borough IT departments that apply before the September 2026 budget cycle can draw on a 2.3 million euro fund earmarked specifically for data-quality tooling.
The numbers suggest that acting sooner rather than later is the rational calculation. A redundancy rate of even 25 percent across Berlin's combined public digital asset estate — estimated at over 400 terabytes by the Senate's own 2025 infrastructure census — represents a storage liability in the tens of millions of euros over a five-year horizon. That is money the coalition can ill afford to leave on the table while it simultaneously debates rent caps and BVG expansion budgets.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Berlin
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News