Berlin's public institutions are sitting on millions of duplicate images clogging their digital archives, and the people responsible for managing those systems say the problem has reached a breaking point. City archivists, software specialists contracted by the Senatsverwaltung für Kultur und Gesellschaftlichen Zusammenhalt, and representatives from Berlin's growing civic-tech sector gathered last week at a closed-door working session in Mitte to agree on a framework for what they are calling a systematic duplicate-image replacement programme.
The urgency is real. Over the past three years, Berlin's institutions have poured significant budget into digitisation — driven partly by the federal Digitalisierungsstrategie and partly by post-pandemic pressure to make public records accessible online. The unintended consequence: repositories have ballooned with redundant, low-resolution, or watermarked copies of the same images sitting alongside higher-quality originals. Storage costs climb, search results degrade, and staff waste hours manually resolving conflicts.
What the Institutions Are Actually Saying
The Zentral- und Landesbibliothek Berlin, which holds digitised holdings from its Amerika-Gedenkbibliothek branch on Blücherplatz in Kreuzberg, has flagged the issue internally for at least eighteen months. Administrators there have described the problem in internal documents circulated to the Senatsverwaltung as a structural flaw in the original digitisation pipeline — one that prioritised speed of upload over quality control. The ZLB's digitisation unit began a pilot audit of roughly 40,000 image records in January 2026, and early results reportedly showed duplication rates well above what staff had anticipated going into the exercise.
At the Stadtmuseum Berlin, whose main administrative operations are centred near Klosterstraße in the old city core, curators have pushed for automated deduplication tools since at least 2024. The institution manages image assets from multiple historic collections — including material from the Ephraim-Palais and the Märkisches Museum — and the overhead of maintaining parallel records has become unsustainable at current staffing levels. Specialists there have argued that off-the-shelf perceptual hashing tools, which can identify near-identical images even when file names or metadata differ, should be deployed citywide rather than institution by institution.
The civic-tech community around Tempodrom and the co-working spaces clustered along Oranienburger Straße has not stayed quiet either. Several open-data advocates affiliated with the OK Lab Berlin chapter have publicly pressed the Senatsverwaltung to release deduplication standards as open specifications, arguing that taxpayer-funded solutions should not end up locked inside proprietary vendor contracts. Their position, laid out in a public letter submitted to the Senate in May 2026, is that any framework adopted by the city's roughly 200 public cultural institutions should be interoperable and auditable.
The Numbers Behind the Problem
Precise city-level figures are not yet public, but the scale can be inferred from what is known. The Deutsche Digitale Bibliothek, which aggregates content from German cultural institutions including dozens of Berlin-based contributors, reported in its 2025 annual statistics that its total holdings had crossed 47 million objects — a figure that has roughly doubled in five years. Industry benchmarks for digitisation projects routinely cite duplication rates of between eight and fifteen percent in large unmanaged repositories. Applied conservatively to Berlin's own federated collections, that would imply hundreds of thousands of redundant records.
Storage is not cheap. Commercial cloud archiving for high-resolution image files runs in the range of €20 to €40 per terabyte per month at institutional scale, depending on redundancy and access-tier requirements. Multiply that across archives that have not been cleaned in years, and the fiscal argument for deduplication becomes straightforward even before the question of usability is raised.
The working group that met in Mitte last week is expected to present a draft protocol to the Senatsverwaltung by September 2026. If approved, it would set common technical standards for how Berlin's public institutions identify, flag, and replace duplicate images — including rules for which version of an image is retained when duplicates are found. Institutions are being advised in the meantime to document their current pipelines and halt any new large-scale uploads that lack metadata quality checks. For researchers and members of the public who rely on the city's digital collections, that means the cleanup is coming — but for now, patience is still the only tool on offer.