Berlin's public digital archives contain hundreds of thousands of duplicate image files — identical or near-identical photographs, scans and artwork reproductions stored multiple times across separate databases — and the institutions responsible for managing them are now under pressure to act. Archivists, city officials and technology specialists have spent much of 2026 debating who owns the problem and, more pressingly, who pays to solve it.
The issue matters now because Berlin's SPD-led Senate has pushed digitisation to the centre of its cultural policy agenda, committing in the 2025–2026 coalition agreement to accelerate public access to the city's historical collections. That political momentum has brought long-standing data hygiene problems into sharp relief. Duplicate records inflate storage costs, confuse researchers cross-referencing collections and, in some cases, cause different institutions to licence the same image independently — paying rights fees more than once for the same asset.
Where the Problem Shows Up
The Stadtmuseum Berlin, which manages collections across sites including the Märkisches Museum on Köllnischer Park and the Ephraim-Palais in the Nikolaiviertel, has been working since early 2025 to reconcile its digital image catalogue after a series of collection mergers left thousands of records with overlapping metadata. The Landesarchiv Berlin on Eichborndamm in Reinickendorf faces a related challenge: digitisation drives conducted across different departments at different times have produced parallel image sets with inconsistent file naming conventions, making automated deduplication difficult without significant manual review.
Specialists in digital preservation distinguish between exact duplicates — byte-for-byte identical files — and near-duplicates, which include photographs taken seconds apart, scans of the same document at different resolutions, or images that have been cropped or colour-corrected after the fact. The second category is far harder to catch with standard software and accounts for the bulk of the problem in large institutional collections. Researchers working with Berlin's image databases have noted that a single historical photograph of Potsdamer Platz can appear under several different accession numbers, with incompatible dates and attribution details attached to each version.
Dirk Moldt, a digital infrastructure consultant who has worked with several German state-level cultural bodies, has described the core difficulty in public presentations as a governance question as much as a technical one: institutions that built their databases independently over decades did not standardise the way files were named, tagged or ingested, making retroactive deduplication labour-intensive. No specific budget figure for Berlin's remediation work has been confirmed in public Senate documents reviewed for this article.
What Comes Next
The Kulturprojekte Berlin GmbH, the publicly owned company that coordinates cultural programming and digitisation initiatives across the city, has been in discussion with the Senate Department for Culture about piloting a shared image registry — a centralised index that would allow participating institutions to flag potential duplicates before new files are uploaded. A working group involving the Staatsbibliothek zu Berlin on Potsdamer Straße is understood to be examining similar models adopted in the Netherlands and at the British Library, though no formal proposal has been published.
For smaller institutions — independent archives, neighbourhood history projects in districts like Wedding and Lichtenberg, community museums run largely by volunteers — the practical advice from digital preservation specialists is consistent: adopt file hashing at the point of ingest, apply consistent metadata standards from day one, and document provenance clearly enough that a near-duplicate can be identified by a human reviewer even when automated tools disagree. The cost of building good habits at the start of a digitisation project is a fraction of the cost of cleaning up a database years later.
The Senate's broader digitisation push has a soft deadline tied to the 2027 Berlin cultural budget cycle, when institutions will be asked to report progress against the coalition's access targets. That creates a real window — roughly 18 months — for the city's major cultural bodies to get their image catalogues in order before political scrutiny intensifies. Archivists say the technology is not the obstacle. The will to coordinate across institutional lines is.