Berlin's public institutions are sitting on millions of duplicate digital images, and the people responsible for managing them say the problem has quietly become expensive. Archivists at the Zentral- und Landesbibliothek Berlin on Breite Straße in Mitte estimate the institution's digital holdings have grown by roughly 40 percent since 2022, with a significant share of that expansion driven not by new acquisitions but by repeated uploads of the same files across departments. The numbers, shared internally at a working-group session in May 2026, have pushed the deduplication question onto the agenda of the Senatsverwaltung für Kultur und Gesellschaftlichen Zusammenhalt.
The timing matters. Berlin's coalition government, led by the SPD, committed in its 2025 budget framework to digitising 1.2 million archival items by the end of 2027. That target is already strained by infrastructure costs, and duplicate image storage is one of the line items drawing scrutiny. Cloud storage contracts negotiated by the Berliner Senat run to several hundred thousand euros annually, and administrators say redundant files are inflating those bills without adding public value.
What the Specialists Are Saying
Deduplication — the automated process of identifying and removing or consolidating identical or near-identical image files — is not new technology. What is new, specialists at the Fraunhofer Institut für Offene Kommunikationssysteme, based in Charlottenburg, argue, is the scale at which Berlin's public sector now needs to apply it. The institute has been involved in digital-infrastructure consultancy for German federal and state bodies, and its researchers have pointed in published work to the growing gap between digitisation ambitions and the data-hygiene practices needed to support them.
At the Stadtmuseum Berlin, which manages collections across several sites including the Ephraim-Palais in Nikolaiviertel, curators describe a workflow problem: image files are frequently generated at multiple resolutions during scanning, then saved in full across shared drives without a standardised naming or tagging convention. The result is that a single historical photograph of, say, the Potsdamer Platz can exist in six or seven versions in the same system, indistinguishable to a basic search. Staff time spent manually resolving those duplicates is time not spent on cataloguing new material.
Advocates for faster reform point to the experience of the Stiftung Preußischer Kulturbesitz, which administers collections including those at the Kulturforum near the Tiergarten. The foundation began a structured deduplication programme for its image databases in late 2024, using hash-matching software to flag identical files before human review. According to documentation from the foundation's annual digitisation report published in early 2025, the first phase of the programme identified redundant copies accounting for approximately 18 percent of total image storage in the tested collection segments.
Policy Pressure and Practical Next Steps
The Senatsverwaltung has not yet published a formal directive on duplicate-image management, but officials have signalled that guidelines are being drafted for circulation to publicly funded cultural institutions by the fourth quarter of 2026. The draft framework, according to background briefings from the culture administration, is expected to mandate minimum metadata standards and require institutions receiving digitisation grants under the Berlin Digital Culture Fund to demonstrate deduplication protocols before disbursement.
For smaller institutions — neighbourhood archives in Neukölln or Prenzlauer Berg, community libraries that lack dedicated IT staff — the challenge is less about will than capacity. Advocates within the Berliner Bibliotheksverbund, the network linking the city's public library branches, have called for a centralised deduplication service that smaller bodies could plug into rather than building their own. That proposal is under review but has not been formally funded.
The practical advice from archivists currently managing the problem: start with naming conventions before reaching for software. Institutions that standardised their file-naming structures first — date, subject, resolution, source department — found that automated deduplication tools performed significantly more accurately when they were eventually deployed. The Zentral- und Landesbibliothek's internal documentation from its 2025 workflow audit supports that sequencing. The technology is available. The governance framework is catching up.