Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Digital Archives Face a Reckoning Over Duplicate Images: The Key Decisions Ahead

A growing backlog of duplicate and misidentified photographs in the city's public image databases is forcing administrators, cultural institutions, and tech contractors to choose between costly manual review and AI-driven solutions.

By Berlin News Desk · Published 4 July 2026, 8:58 pm

3 min read

Berlin's Digital Archives Face a Reckoning Over Duplicate Images: The Key Decisions Ahead
Photo: Photo by Felipe Souza Melo on Pexels
Wird übersetzt…

Berlin's network of publicly funded digital archives is sitting on a problem that administrators can no longer defer. Tens of thousands of duplicate, mislabelled, and redundant images have accumulated across the Landesarchiv Berlin, the Stadtmuseum Berlin collection, and the Senate's own digital asset management systems — the result of years of fragmented digitisation drives that lacked a unified standard for file naming, metadata, or deduplication checks.

The issue has become urgent now because the SPD-led Senate coalition committed in its 2025 governing agreement to consolidate Berlin's fragmented cultural data infrastructure under a single interoperable platform by the end of 2027. That deadline gives administrators roughly 18 months to resolve a backlog that has been building since at least 2018, when the city's first coordinated digitisation push began without mandatory deduplication protocols.

What the Backlog Actually Looks Like

The Landesarchiv Berlin, headquartered on Eichborndamm in Reinickendorf, holds the most acute version of the problem. Photographic holdings from the Cold War-era city administration were scanned in at least three separate project phases between 2018 and 2024, each using different contractors and different file-format standards. The result is a collection where the same physical photograph sometimes exists as a 300dpi TIFF, a compressed JPEG, and a cropped web-optimised PNG — each registered as a distinct record. Stadtmuseum Berlin, whose collections span sites including the Märkisches Museum in Mitte and the Ephraim-Palais in the Nikolaiviertel, faces a similar challenge after inheriting digitised assets from five smaller collections merged under its umbrella in 2021.

Staff workloads tell part of the story. Industry estimates for manual image deduplication in large institutional collections typically run to around 200 to 400 images per staff member per day, depending on metadata quality. For a backlog conservatively estimated at 80,000 redundant records across the city's major institutions, that implies several months of dedicated full-time work — work that existing archival staff cannot absorb without either new hires or a pause in active digitisation.

Two contractors are now competing for a Senate contract to handle the technical resolution. One proposal, submitted to the Senatsverwaltung für Kultur und Gesellschaftlichen Zusammenhalt in May, centres on a perceptual hashing algorithm that can flag visually identical or near-identical images at scale. The other proposes a hybrid model combining automated flagging with a structured human review layer, at an estimated project cost in the low six-figure euro range. A decision is expected before the end of the third quarter of 2026.

The Decisions That Will Shape the Outcome

Three choices will define how this plays out. First, the Senate must decide whether to mandate a single metadata standard — specifically whether to adopt the internationally recognised Dublin Core schema or develop a Berlin-specific extension of it — before any deduplication tool goes live. Applying deduplication logic to records with inconsistent metadata fields produces unreliable results, and fixing metadata after the fact is more expensive than standardising it first.

Second, institutions must agree on what happens to duplicate records once they are identified. Deletion is the cleanest technical solution but raises legitimate archival objections: a JPEG created in 2018 and a TIFF created in 2023 of the same photograph are themselves records of two digitisation decisions. The Landesarchiv has historically preferred retention with suppression — keeping the file but removing it from public-facing search — but that approach requires additional storage and ongoing maintenance.

Third, and most politically charged, is the question of who controls the consolidated platform. Berlin's startup sector, particularly the cluster of civic-tech firms operating out of spaces like the Factory Berlin campus in Mitte, has lobbied for an open-architecture approach that would allow third-party developers to build on top of the city's image data. Cultural institutions have pushed back, citing rights management concerns for photographs where copyright has not yet expired.

The Senate's cultural administration is expected to publish a technical specification document in September 2026, which will serve as the de facto framework for how all three decisions get resolved. Archivists, contractors, and civic-tech advocates have until then to make their case. What happens after September will determine whether Berlin's digital cultural memory gets properly organised — or simply accumulates another layer of well-intentioned disorder.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.