Berlin's public digital infrastructure has a clutter problem. Across city-managed databases, municipal archives, and cultural repositories, duplicate image files have accumulated for years — clogging storage systems, inflating licensing costs, and creating confusion about which versions of official photographs and historical documents are authoritative. Now, pressure from the SPD-led coalition's digital reform agenda is forcing the issue into the open.
The problem is not trivial. Storage and data management contracts for Berlin's public sector institutions run into the tens of millions of euros annually, and archive specialists have flagged that redundant image files — the same photograph filed under multiple catalogue entries, or scanned twice by separate departments — account for a meaningful share of that overhead. The exact figure remains contested, but the discussion has gained urgency as the Senate Chancellery's digitalisation office pushes forward with its Berlin Digital Strategy, a programme that is supposed to streamline how the city's roughly 120 administrative bodies handle data by the end of 2027.
What the Institutions Are Saying
The Stadtmuseum Berlin, which manages collections across several sites including the Märkisches Museum on Köllnischer Park, has been among the more candid voices. Staff there have described a years-long backlog in catalogue reconciliation — a consequence of multiple digitisation drives that were never fully coordinated. The Zentral- und Landesbibliothek Berlin, based on Breite Straße in Mitte, faces a parallel challenge: its digital reading room holds hundreds of thousands of image assets, and internal reviews have identified significant duplication between collections absorbed from predecessor institutions after reunification.
Experts working in the field of digital asset management are clear about the stakes. Sebastian Möller, a researcher at the Technische Universität Berlin's Quality and Usability Lab in Charlottenburg, has written extensively on metadata inconsistency in public-sector digital projects. He has argued in published work that without standardised deduplication protocols applied at the point of ingestion — not retrospectively — European cities will keep paying to store data they already hold. His lab has participated in advisory work linked to the EU's Interoperable Europe Act, which came into force in April 2024 and sets expectations for member states on exactly this kind of cross-agency data coherence.
Ralf Kleindiek, Berlin's State Secretary for Digitalisation, has publicly endorsed accelerating the city's data governance overhaul, though his office has not specified a budget line for deduplication work. The Senate Department for Internal Affairs, which oversees the digitalisation agenda, confirmed in its 2025 annual report that interoperability and data quality are listed among four priority areas for the current legislative term — but the document stops short of quantifying the duplicate-image problem specifically.
What Comes Next — and What Institutions Should Do Now
Technologists advising Berlin's cultural sector say the practical path forward involves deploying perceptual hashing tools — software that can identify visually identical or near-identical images regardless of filename or metadata — across unified storage platforms. The Berliner Senatsverwaltung für Kultur has been in contact with Wikimedia Deutschland, whose office is on Tempelhofer Ufer in Kreuzberg, about best practices drawn from the Wikimedia Commons repository, which handles deduplication at scale across millions of freely licensed files.
For smaller institutions — neighbourhood archives in Prenzlauer Berg, community libraries in Wedding, local history collections in Spandau — the challenge is resources. Deduplication software licences for enterprise-grade tools start at roughly 8,000 euros per year for mid-sized repositories, a figure that puts them out of reach for bodies operating on district-level cultural budgets. Advocates are pushing for a shared-services model, under which the Senate provides centralised tooling that smaller institutions can access without separate procurement.
A working group under the Berlin Open Data coordination office is expected to present recommendations to the Senate by September 2026. Whether that timeline holds — given competing pressures on the digital reform agenda — is something archive directors across the city are watching closely. What is clear is that the conversation has moved from technical backrooms into policy discussions, and the institutions doing the talking want concrete commitments before the next digitisation funding round opens in early 2027.