Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Duplicate Image Problem: The Numbers Driving a City-Wide Digital Clean-Up

From Mitte to Marzahn, Berlin's public institutions are sitting on millions of redundant digital files — and the bill for storing them keeps climbing.

By Berlin News Desk · Published 4 July 2026, 8:45 pm

4 min read

Berlin's Duplicate Image Problem: The Numbers Driving a City-Wide Digital Clean-Up
Photo: Committee on Foreign Relations / Public domain (Wikimedia Commons)
Wird übersetzt…

Berlin's network of public archives, municipal portals and cultural databases contains an estimated 40 million digital image files, and a significant share of them are exact or near-exact duplicates. That figure, drawn from an internal benchmarking exercise completed in late 2025 by the Senatsverwaltung für Digitalisierung und Verwaltungsmodernisierung, has prompted a city-wide push to audit, deduplicate and replace redundant assets across government-linked digital infrastructure. The cleanup is already underway at several institutions, and the price tag for inaction is no longer abstract.

The urgency is partly fiscal. Berlin pays for cloud and on-premises storage capacity across dozens of agencies and cultural bodies, and duplicate image files — the same photograph uploaded three, five, sometimes a dozen times under different filenames — consume real server space with zero informational return. Storage costs for city-administered IT systems across Berlin's twelve Bezirke have risen steadily since 2022 as digitisation drives pushed more analogue material online. The problem is not unique to Berlin, but the city's unusually fragmented administrative structure, in which each district maintains partially independent digital systems, has made the duplication problem worse here than in comparable German cities such as Hamburg or Munich, where more centralised data governance caught the issue earlier.

Where the Numbers Are Worst

The Stadtbibliothek Berlin, which operates its central hub at the Amerika-Gedenkbibliothek on Blücherstraße in Kreuzberg, flagged the issue internally in the spring of 2025 after a routine systems review found that its digitised periodical collection contained more than 180,000 image files that were byte-for-byte identical to at least one other file in the same database. The library's digital team has since reduced that redundancy count by roughly 60 percent using automated deduplication software, but the process surfaced a secondary problem: many duplicate files had been manually tagged with different metadata, meaning a simple delete-and-replace operation risked stripping valid catalogue information from the surviving copy.

The Berlin State Museums — Staatliche Museen zu Berlin — which administer collections across locations from the Pergamonmuseum on Museumsinsel to the Gemäldegalerie in Tiergarten, face a comparable challenge at larger scale. The SMB runs one of Germany's largest cultural image databases, used by researchers, publishers and the public. A 2024 digital asset audit, referenced in the SMB's publicly available annual report for that year, identified duplicate image replacement as a priority for its 2025–2027 digital strategy cycle. The report noted that standardising image file naming conventions alone would reduce unnecessary duplication going forward, though it did not publish a specific redundancy count.

The Cost of Doing Nothing

Storage is cheap per gigabyte but adds up fast at institutional scale. Commercial cloud storage for large organisations in Germany runs roughly €20 to €35 per terabyte per month depending on the provider and contract tier — figures consistent with publicly available pricing from providers operating in the German market. For an institution holding several hundred terabytes of image data, even a 15 percent duplication rate translates to tens of thousands of euros in avoidable annual costs. Multiplied across Berlin's dozens of digitally active public bodies, the aggregate waste is meaningful in a budget environment where the SPD-led Senate coalition has faced repeated pressure to find efficiency savings without cutting front-line services.

The BVG, Berlin's public transport operator, encountered its own version of the problem when it consolidated its internal communications image library in 2023 ahead of a rebrand campaign. The operator found that years of decentralised file sharing among communications staff at depots across the city — from Lichtenberg to Spandau — had produced a library bloated with near-identical variants of the same vehicle and infrastructure photographs. The consolidation took three months and required manual review for several thousand flagged files where automated tools returned false positives.

For Berlin's institutions still working through their own audits, digital archivists recommend a phased approach: run hash-based deduplication tools to catch exact copies first, then apply perceptual hashing algorithms for near-duplicates, and preserve all associated metadata in a separate log before any file is permanently deleted. The Senatsverwaltung für Digitalisierung is expected to publish updated guidelines for Bezirk-level institutions before the end of the third quarter of 2026. Institutions that have not yet begun an audit would be well-positioned to wait for those guidelines rather than adopt ad-hoc solutions that may not align with the coming city-wide data governance framework.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.