Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Digital Archives Push Hits a Wall: The Duplicate Image Problem Flaring Up This Week

City institutions scrambling to digitise historical collections are losing time and budget to a surprisingly stubborn technical headache — thousands of duplicate images clogging their databases.

By Berlin News Desk · Published 4 July 2026, 8:28 pm

3 min read

Berlin's Digital Archives Push Hits a Wall: The Duplicate Image Problem Flaring Up This Week
Photo: Lucas, E. V. (Edward Verrall), 1868-1938 Morrow, George, 1869-1955 / Public domain (Wikimedia Commons)
Wird übersetzt…

Three of Berlin's largest public digitisation projects ground to a near-standstill this week as administrators confronted a problem that sounds mundane but carries real consequences: tens of thousands of duplicate image files have accumulated inside shared archival databases, slowing retrieval systems and inflating storage costs at a moment when public IT budgets are already stretched thin.

The issue surfaced publicly on Tuesday when the Koordinierungsstelle für wissenschaftliche Universalsammlungen, the city-backed body that oversees digitisation standards across Berlin's municipal collections, circulated an internal advisory to partner institutions. The document, seen by The Daily Berlin, flagged that duplicate-image rates in shared repositories had climbed sharply over the past eighteen months — a direct byproduct of multiple departments scanning the same physical materials independently, without cross-checking against existing holdings.

Where the Backlog Is Biting

The Stadtbibliothek on Breite Straße in Mitte and the Landesarchiv Berlin on Eichborndamm in Reinickendorf are both named in the advisory as institutions facing acute deduplication backlogs. Both have been running parallel digitisation workflows — the Stadtbibliothek as part of its ongoing effort to make pre-war Berlin neighbourhood maps accessible online, the Landesarchiv as part of a broader push to digitise civil registration records dating to the 1870s. Without a unified intake process, the same scan sometimes entered the shared repository two, three or four times under slightly different file names.

Storage is not a trivial cost. Berlin's Senate Department for Culture and Social Cohesion allocated roughly 4.2 million euros to digitisation infrastructure in the 2025-2026 budget cycle. Administrators working on the project say a meaningful share of that money is being absorbed by redundant cloud storage and the manual labour required to identify and remove duplicate files — work that was not budgeted for when the current contracts were signed in early 2024.

The deduplication challenge also has a knock-on effect on the BVG digital-heritage partnership announced last November, in which the transport authority agreed to contribute its own photographic archive — spanning tram-line construction through the 1950s to early U-Bahn expansion — to the shared repository. BVG's archive alone runs to an estimated 80,000 images. If those files arrive before the existing duplicate problem is resolved, archivists say the cleanup task becomes exponentially harder.

Technical Fix or Workflow Overhaul?

Two approaches are now on the table. The first is a purely technical one: deploying perceptual-hashing software that compares images pixel-by-pixel and flags near-identical files for human review. Institutions in Hamburg and Vienna have used similar tools on municipal archive projects in recent years. The second approach is a governance fix — mandating that all partner institutions check the central repository before any new scan batch is uploaded, a rule that exists on paper but has clearly not been enforced consistently.

The Wikimedia Deutschland office on Tempelhofer Ufer, which has collaborated with Berlin institutions on open-access image uploads to Wikimedia Commons, confirmed this week that it has offered to share its own deduplication workflow documentation with city partners. Wikimedia Commons has grappled with the same problem at a global scale and developed semi-automated tools that could be adapted for use with Berlin's FAUST archive management system.

A decision on which path to take is expected before the summer recess, with the Senate department likely to convene a working group meeting in the third week of July. Institutions are being advised in the meantime to pause any new bulk uploads to the shared repository until a deduplication pass can be completed on existing holdings. For researchers and members of the public who use the online portals to access historical Berlin material — including the popular Kauperts street directory scans and pre-war Baedeker city maps — some search results may remain incomplete or return duplicate entries for the coming weeks. The advisory recommends noting file-creation dates when downloading to avoid working with redundant copies.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.