Berlin's public administration is sitting on a problem it has largely refused to measure — until now. Across municipal databases, cultural archive systems and the city's sprawling property registry, duplicate image files are consuming server capacity, distorting search results and costing the Senatsverwaltung für Stadtentwicklung, Bauen und Wohnen measurable sums each budget cycle. New internal assessments circulated among Berlin's IT procurement offices this spring put the scale of the redundancy problem into sharper focus: some departmental image repositories contain duplication rates above 30 percent.
The timing matters. Berlin's coalition government under the SPD is under pressure to justify every line of technology spending amid an ongoing housing shortage that has pushed rents in Prenzlauer Berg and Mitte to record highs. When public-facing platforms like the Wohnungsmarktbericht portal or the BerlinOnline civic services hub serve users redundant or mismatched property images, the downstream effect is not merely aesthetic — it erodes trust in digital governance at the precise moment the city is trying to make that governance work.
What the Numbers Actually Show
The scale of the issue runs across multiple institutions. The Stadtmuseum Berlin, which manages photographic collections across sites including the Märkisches Museum on Köllnischer Park, has publicly acknowledged holdings of more than 1.2 million digitised objects. Deduplication audits in comparable European municipal museum systems — including initiatives benchmarked by the German Digital Library, the Deutsche Digitale Bibliothek based in Frankfurt — have found average redundancy rates between 18 and 35 percent in collections that grew organically without centralised metadata standards. Apply even the lower end of that range to Berlin's holdings and the redundant file count runs into six figures.
On the commercial and administrative side, the numbers carry direct financial weight. Server storage in Berlin's primary municipal data centre, operated through the IT service provider ITDZ Berlin in Adlershof, is priced internally at rates comparable to enterprise cloud pricing — typically between 0.02 and 0.05 euros per gigabyte per month for managed storage at scale. A repository carrying 30 percent redundant image data across several terabytes translates to tens of thousands of euros in avoidable annual storage costs before staff time is factored in.
The BVG, Berlin's public transport operator, faced a related issue when it expanded its digital asset library for the 2023–2025 network map redesign project. Multiple image versions of U-Bahn line graphics, station photography from Alexanderplatz and Potsdamer Platz, and accessibility signage assets proliferated across internal design workflows. The deduplication and metadata standardisation phase of that project was budgeted as a discrete line item — an acknowledgment that the problem carries real cost rather than being a background nuisance.
Why Deduplication Is Harder Than It Sounds
The technical challenge is not finding identical files. Automated tools have handled exact-match deduplication reliably for over a decade. The harder problem is perceptual duplication — images that are visually near-identical but differ in resolution, compression, watermark, or crop. A photograph of the Rotes Rathaus taken at the same moment might exist in a 4K master, a web-optimised JPEG, a thumbnail, and a watermarked press copy, each carrying a different file hash but representing the same creative asset. Standard deduplication scripts miss all four as candidates for consolidation.
Berlin's Zentralbibliothek am Blücherplatz, part of the Stadtbibliothek network, began piloting perceptual hashing tools — specifically pHash-based comparison algorithms — in late 2024 as part of a broader digital collection review. Early findings from that pilot, presented at a library digitisation working group in March 2025, suggested that roughly one in five image assets in the tested subset was a near-duplicate of another file already present in the system.
For institutions and city departments looking at this problem practically, the path forward involves three sequential steps: a full metadata audit to establish baseline file counts, deployment of perceptual comparison tooling rather than hash-only matching, and a governance decision about which version of a duplicated asset becomes the canonical record. The last step is the most politically fraught — different departments have different ownership claims over the same images, and resolving those claims requires cross-departmental coordination that Berlin's siloed IT infrastructure has historically struggled to deliver. The Senatsverwaltung für Digitalisierung has flagged unified asset management as a 2026 priority in its published digital strategy roadmap, but funding allocation for that initiative has not yet been confirmed in the current budget round.