Berlin's public institutions are sitting on a problem nobody wanted to name out loud. Across municipal databases, from the Landesarchiv Berlin on Eichborndamm to the sprawling digital repositories maintained by Kulturprojekte Berlin, redundant and duplicate image files have accumulated over years of digitisation drives, costing storage budget and, more critically, corroding the integrity of public records. Now, with a city-wide digital infrastructure review underway, officials and technical experts are speaking more openly about the scale of the issue than they ever have before.
The timing matters. Berlin's Senate Department for Digital Transformation and Administrative Modernisation is mid-way through a reform agenda that runs through 2027, with roughly €280 million earmarked across departments for digital infrastructure. When a significant portion of that storage capacity is consumed by duplicated assets — the same photograph appearing dozens of times under different file names across disconnected systems — the inefficiency has a direct fiscal consequence. It also has a legal one: Germany's strict archival law, the Bundesarchivgesetz, requires that public records be maintained in a form that ensures authenticity. A database riddled with duplicate and mislabelled images complicates that obligation considerably.
What Experts Are Saying on the Ground
Technologists working with Berlin's public sector have been raising the alarm for at least two years. The problem is particularly acute in institutions that underwent rapid emergency digitisation during 2020 and 2021, when pandemic closures pushed archives to scan and upload material quickly without standardised metadata protocols. The Zentral- und Landesbibliothek Berlin, which holds one of the largest publicly accessible digital image collections in the city, has acknowledged in its annual reporting that deduplication remains an ongoing technical challenge — though the library has not yet published a figure for what percentage of its holdings are affected.
Specialists in digital preservation point to a structural issue: Berlin's public institutions often operate proprietary or legacy content management systems that do not communicate with each other. A photograph taken at a Neukölln community event in 2018, for instance, might exist in the BVG's press archive, the district's own Bezirksamt image bank, and a Kulturprojekte Berlin campaign folder — three separate copies, none flagged as related to the others. Removing duplicates without auditing each file risks deleting the only remaining copy of a unique record, which is why automated deduplication tools, however sophisticated, require human review at scale.
The conversation has moved into Berlin's startup world too. Several companies operating out of Factory Berlin in Mitte, which houses technology firms focused on media management and AI image processing, have been in early-stage discussions with Senate departments about piloting deduplication software in public archive environments. No contracts have been announced, and procurement processes under Berlin's Vergabegesetz are lengthy. But the interest itself signals a shift: city government is, however cautiously, looking at private-sector tools rather than trying to solve the problem purely with in-house IT resources.
What Comes Next for Institutions and Users
For citizens and researchers who rely on Berlin's public image archives — journalists, urban planners, historians working out of institutions like the Humboldt-Universität zu Berlin — the practical consequences of the duplicate problem are already felt. Search results return the same image multiple times. Licensing information is inconsistent across copies. Version control, particularly for photographs that have been cropped or colour-corrected, is close to nonexistent in some systems.
The Senate's digital reform office has indicated that a formal working group on archival data quality, including the duplicate image question, is expected to convene in the third quarter of 2026. Institutions will likely be asked to submit inventories of their digital holdings before the end of the year. Whether that leads to a centralised deduplication mandate, or leaves each institution to resolve the problem on its own schedule, is the decision that experts say will determine whether Berlin's archival infrastructure is genuinely modernised or simply patched over once more.
For now, the Landesarchiv on Eichborndamm keeps scanning. The files keep piling up. And the hard conversation about what to delete, what to keep, and who decides has, at least, finally started.