Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Digital Archive Push Hits a Snag: Duplicate Images Clogging City Databases This Week

A wave of redundant scanned files has slowed access to public records and heritage collections, forcing administrators at two major Berlin institutions to act fast.

By Berlin News Desk · Published 4 July 2026, 9:06 pm

3 min read

Berlin's Digital Archive Push Hits a Snag: Duplicate Images Clogging City Databases This Week
Photo: Photo by Ali Durmuş Cevlan on Pexels
Wird übersetzt…

Tens of thousands of duplicate image files have accumulated inside Berlin's municipal digital archives, administrators confirmed this week, creating access bottlenecks and inflating storage costs at a moment when the city is mid-way through an ambitious multi-year digitisation drive. The problem surfaced publicly on Monday, July 1, when users of the Landesarchiv Berlin's online portal reported slow load times and broken search results for historic property records — files that Mitte-district housing advocates have been consulting heavily amid the ongoing rent cap debate.

The timing is awkward. Berlin's SPD-led Senate has staked political credibility on making government data more transparent and usable, particularly around housing stock and planning documents. If the city's own archivists cannot reliably serve up clean, deduplicated records, that transparency promise starts to look thin.

How the Backlog Built Up

The root cause, according to internal communications reviewed by The Daily Berlin, is a failure to coordinate between two parallel scanning programmes that ran simultaneously between 2023 and early 2026. The Landesarchiv Berlin on Eichborndamm in Reinickendorf and the Stadtmuseum Berlin's digital unit based at the Märkisches Museum in Mitte both ingested large batches of overlapping photographic collections — particularly images documenting post-war reconstruction and the late-1980s Kreuzberg squatter era. Neither institution's content management system flagged the overlap automatically, because the files arrived under different metadata schemas and file-naming conventions.

By the time the problem was isolated this week, administrators estimated that roughly 34,000 image files across the two systems were either exact duplicates or near-identical scans of the same physical originals. That figure represents close to 18 percent of the combined photographic holdings uploaded during the joint digitisation sprint funded under the Berlin Digital Strategy 2025 programme, a €47 million initiative approved by the Senate Department for Culture in November 2022.

Storage costs are not trivial. Commercial cloud archiving at the scale Berlin uses runs to several thousand euros per terabyte annually, and duplicates consume capacity that administrators had budgeted for new material coming in from district-level collections in Neukölln, Spandau and Pankow later this year.

What Administrators Are Doing About It

Staff at the Landesarchiv began running a retrospective deduplication script on Tuesday, using open-source perceptual hashing tools originally developed for cultural heritage projects in the Netherlands. The process compares image fingerprints rather than raw file names, meaning it catches near-duplicates even when scanned at slightly different resolutions or with minor colour calibration differences. Administrators expect the first pass to complete by mid-July.

The Stadtmuseum has taken a more manual approach for its Märkisches Museum holdings, assigning three archivists to cross-reference flagged files against the physical catalogue before any deletion is authorised. That caution is warranted: accidental deletion of a unique scan would be irreversible, and at least some of the flagged pairs are similar but not identical — capturing marginally different moments or angles of the same historical scene.

Berliners who rely on these archives — historians, journalists, lawyers managing property disputes in Prenzlauer Berg and Schöneberg, genealogists tracing families through Cold War records — should expect intermittent search delays through the rest of July. The Landesarchiv has posted a notice on its portal advising users to contact staff directly by email for time-sensitive requests, and has extended its telephone consultation hours on Wednesdays from 9am to 6pm until further notice.

Longer term, the two institutions say they will adopt a shared metadata standard modelled on the Dublin Core framework, which would allow their systems to flag overlapping acquisitions before ingestion rather than after. A working group is due to meet for the first time at the Zentralbibliothek am Halleschen Ufer on September 9. Whether that coordination arrives in time to prevent the same problem recurring when the Neukölln and Pankow district collections are uploaded remains an open question — one that Senate officials will likely face at the next culture committee session scheduled for late August.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.