Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Digital Archives Face a Reckoning Over Duplicate Images: The Key Decisions Ahead

From Mitte to Marzahn, city institutions are grappling with how to handle vast backlogs of duplicate digital images — and the choices made this summer will shape public access to Berlin's visual history for decades.

By Berlin News Desk · Published 4 July 2026, 8:40 pm

3 min read

Wird übersetzt…

Berlin's cultural institutions are sitting on a problem they have spent years quietly ignoring. Across the city's major archives and digital repositories, duplicate image files — identical or near-identical photographs stored multiple times under different catalogue numbers — are consuming server space, distorting search results and undermining the reliability of public-facing databases. The question of what to do about them has now become urgent, as the Senate Department for Culture and Social Cohesion pushes institutions toward a unified digital infrastructure ahead of a 2027 consolidation deadline.

The timing matters. Berlin has committed significant public funding to its Digitale Berliner Stadtbibliothek project, which aims to bring the holdings of the Zentral- und Landesbibliothek Berlin on Breite Straße and the Stadtmuseum Berlin network under one searchable portal. Duplicate images don't just waste storage — they create legal exposure around rights management, confuse provenance records and force curators to spend hours on manual checks that automated systems should handle. With the consolidation portal scheduled for a phased public launch starting in March 2027, decisions about deduplication policy cannot wait until autumn.

What the Backlog Actually Looks Like

The scale of the problem is difficult to pin down precisely because each institution tracks it differently. The Landesarchiv Berlin on Eichborndamm in Reinickendorf, which holds over 1.3 million photographic items, has acknowledged in its annual reports that digitisation drives from 2018 to 2023 produced significant duplication when batches were uploaded across incompatible cataloguing systems. The Stadtmuseum's photo collections, spread across sites including the Märkisches Museum on Am Köllnischen Park, face a similar structural issue: images digitised during the pandemic years for emergency remote access were often uploaded without cross-referencing existing digital holdings.

Deduplication is not simply a matter of running a script. A photograph of Alexanderplatz taken in 1965 might exist in three versions — a raw scan, a contrast-adjusted copy made for a 2019 exhibition catalogue and a lower-resolution thumbnail generated for a mobile app. All three may carry different metadata, different rights annotations and different curatorial notes. Deleting the wrong version means losing information. Keeping all three without a clear hierarchy means the problem reproduces itself.

Software solutions exist. The Fraunhofer-Gesellschaft, which operates research facilities including its Heinrich Hertz Institut on Einsteinufer in Charlottenburg, has published work on perceptual hashing and image fingerprinting that can cluster near-duplicate images for human review rather than automated deletion. Several German municipal archives have piloted such tools, though none at the scale Berlin now requires. The cost of enterprise-grade deduplication software with archival compliance features typically runs from €40,000 to well over €200,000 depending on collection size — a real budget line for institutions already stretched by energy cost increases driven by the Energiewende transition.

The Decisions That Will Define the Outcome

Three choices dominate the coming months. First, institutions must agree on a master-copy standard: which version of a duplicate image survives as the canonical record and what metadata schema it carries. Without Senate-level coordination, each institution will default to its own convention, making future interoperability harder, not easier.

Second, there is the question of who reviews flagged duplicates. Automated tools can surface candidates, but a human archivist must confirm deletion. Hiring or retraining staff for that role takes months and budget approval that would need to appear in the 2027 Senate budget draft, submissions for which close in September 2026.

Third, and most politically charged, is public access during the transition. If collections are taken offline for deduplication work, researchers at institutions like the Humboldt-Universität zu Berlin and journalists using the Landesarchiv for historical reporting face disruption. A phased approach — working through the collections neighbourhood by neighbourhood, starting with the most-queried holdings — is the option most archivists privately favour, though it extends the timeline past 2028.

The Senate's culture department has indicated it will publish draft guidelines for the consolidation project in September. That document will be the first real test of whether Berlin's institutions can align on a common standard or whether each archive heads into 2027 carrying its duplicate burden alone.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.