Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Digital Archives Are Full of Duplicate Images — and Officials, Experts and Archivists Are Finally Talking About Fixing It

From the Stadtmuseum to the Landesarchiv, Berlin's cultural institutions are confronting a quiet data crisis that is costing money, slowing research and burying history.

By Berlin News Desk · Published 4 July 2026, 9:00 pm

3 min read

Berlin's Digital Archives Are Full of Duplicate Images — and Officials, Experts and Archivists Are Finally Talking About Fixing It
Photo: Photo by Wolf Art on Pexels
Wird übersetzt…

Berlin's public digital archives contain hundreds of thousands of duplicate image files — identical or near-identical photographs, scans and artwork reproductions stored multiple times across separate databases — and the institutions responsible for managing them are now under pressure to act. Archivists, city officials and technology specialists have spent much of 2026 debating who owns the problem and, more pressingly, who pays to solve it.

The issue matters now because Berlin's SPD-led Senate has pushed digitisation to the centre of its cultural policy agenda, committing in the 2025–2026 coalition agreement to accelerate public access to the city's historical collections. That political momentum has brought long-standing data hygiene problems into sharp relief. Duplicate records inflate storage costs, confuse researchers cross-referencing collections and, in some cases, cause different institutions to licence the same image independently — paying rights fees more than once for the same asset.

Where the Problem Shows Up

The Stadtmuseum Berlin, which manages collections across sites including the Märkisches Museum on Köllnischer Park and the Ephraim-Palais in the Nikolaiviertel, has been working since early 2025 to reconcile its digital image catalogue after a series of collection mergers left thousands of records with overlapping metadata. The Landesarchiv Berlin on Eichborndamm in Reinickendorf faces a related challenge: digitisation drives conducted across different departments at different times have produced parallel image sets with inconsistent file naming conventions, making automated deduplication difficult without significant manual review.

Specialists in digital preservation distinguish between exact duplicates — byte-for-byte identical files — and near-duplicates, which include photographs taken seconds apart, scans of the same document at different resolutions, or images that have been cropped or colour-corrected after the fact. The second category is far harder to catch with standard software and accounts for the bulk of the problem in large institutional collections. Researchers working with Berlin's image databases have noted that a single historical photograph of Potsdamer Platz can appear under several different accession numbers, with incompatible dates and attribution details attached to each version.

Dirk Moldt, a digital infrastructure consultant who has worked with several German state-level cultural bodies, has described the core difficulty in public presentations as a governance question as much as a technical one: institutions that built their databases independently over decades did not standardise the way files were named, tagged or ingested, making retroactive deduplication labour-intensive. No specific budget figure for Berlin's remediation work has been confirmed in public Senate documents reviewed for this article.

What Comes Next

The Kulturprojekte Berlin GmbH, the publicly owned company that coordinates cultural programming and digitisation initiatives across the city, has been in discussion with the Senate Department for Culture about piloting a shared image registry — a centralised index that would allow participating institutions to flag potential duplicates before new files are uploaded. A working group involving the Staatsbibliothek zu Berlin on Potsdamer Straße is understood to be examining similar models adopted in the Netherlands and at the British Library, though no formal proposal has been published.

For smaller institutions — independent archives, neighbourhood history projects in districts like Wedding and Lichtenberg, community museums run largely by volunteers — the practical advice from digital preservation specialists is consistent: adopt file hashing at the point of ingest, apply consistent metadata standards from day one, and document provenance clearly enough that a near-duplicate can be identified by a human reviewer even when automated tools disagree. The cost of building good habits at the start of a digitisation project is a fraction of the cost of cleaning up a database years later.

The Senate's broader digitisation push has a soft deadline tied to the 2027 Berlin cultural budget cycle, when institutions will be asked to report progress against the coalition's access targets. That creates a real window — roughly 18 months — for the city's major cultural bodies to get their image catalogues in order before political scrutiny intensifies. Archivists say the technology is not the obstacle. The will to coordinate across institutional lines is.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.