Berlin's municipal digital archives are facing a quiet crisis. Across institutions from the Stadtmuseum Berlin to the Senatsverwaltung für Kultur, archivists have identified thousands of duplicate image files that have accumulated over more than two decades of digitisation work — redundant scans, misattributed photographs, and conflicting metadata that now make it harder, not easier, to locate reliable visual records of the city's history.
The problem has sharpened this summer because a cross-agency working group, convened in early 2026 under the Senate's digital infrastructure programme, is expected to deliver its deduplication framework by September 30. What that framework looks like will determine how the city manages roughly 4.2 million catalogued image assets — a figure cited in internal communications circulated among participating agencies — and whether the public retains full access during the clean-up.
What the Backlog Actually Looks Like
The scale is not trivial. The Zentral- und Landesbibliothek Berlin, which holds one of the largest publicly accessible digital photograph collections in the German capital, began a systematic audit of its image holdings in January 2026. Archivists working out of the Blücherplatz branch in Kreuzberg have flagged categories where the same photograph appears under three or four separate catalogue entries, sometimes with contradictory captions and different credited photographers. A single image of Potsdamer Platz taken during reunification-era construction, for example, reportedly exists in at least six discrete file variants across two separate database systems.
The immediate technical question is deduplication methodology. Software tools can identify pixel-identical copies automatically, but near-duplicates — images that differ only in cropping, colour correction, or compression — require human review. That labour cost is not trivial. The working group's interim report, circulated in May, estimated that full manual review of flagged near-duplicates across participating institutions would require the equivalent of 11 full-time archivist positions working for 18 months.
Berlin's SPD-led coalition government has signalled willingness to fund digital infrastructure as part of its broader smart-city agenda, but the 2026 city budget was already squeezed by housing subsidies and BVG public transport investment. The Senatsverwaltung für Kultur has not yet confirmed what allocation, if any, will accompany the September framework rollout.
The Decisions That Cannot Wait
Three choices now sit on the desks of senior administrators, and each carries real consequences for researchers, journalists, and the general public who rely on these archives.
First: deletion versus archival quarantine. Permanently removing duplicate files is the cheapest path, but archivists at the Landesarchiv Berlin, housed on Eichborndamm in Reinickendorf, have argued internally that quarantine — keeping duplicates in a non-public holding layer — is safer because metadata embedded in apparently redundant files sometimes contains provenance information not found in the designated primary copy. Delete the wrong file and that context is gone permanently.
Second: who carries editorial authority. The working group includes representatives from at least six separate Berlin agencies. Reaching consensus on which image variant is the canonical version, and which metadata standard governs the final catalogue, has stalled progress repeatedly since February. A lead agency needs to be designated, and that political decision has not been made.
Third: public access continuity. The ZLB's digital portal, Digitale Landesbibliothek, currently serves researchers across Berlin and internationally. Any migration or database restructuring risks weeks of partial outages. The working group's September deadline effectively means any technical work begins in October, putting disruption squarely in the autumn academic term — a timing that has drawn quiet frustration from university departments at Humboldt-Universität and the Technische Universität Berlin, both of which use the archive regularly.
The September 30 deadline is real and the pressure is building. If the working group delivers a credible framework, Berlin could emerge with a genuinely unified, deduplicated visual record — a meaningful civic asset. If the political decisions on budget, lead agency, and deletion policy are deferred again, the archive will continue to grow messier, and the cost of fixing it will rise. Administrators have roughly 12 weeks to find out which outcome they actually want.