Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Public Archives Step Up Fight Against Duplicate Images This Week

A coordinated push across city institutions is forcing Berlin's digital catalogues to confront a years-old problem: thousands of redundant scanned images clogging public databases and slowing access to historical records.

By Berlin News Desk · Published 4 July 2026, 8:48 pm

3 min read

Berlin's Public Archives Step Up Fight Against Duplicate Images This Week
Photo: Photo by Katja L. on Pexels
Wird übersetzt…

Berlin's Landesarchiv confirmed this week that it has launched a structured review of its digital image holdings, targeting the duplicate photograph and document scans that have accumulated across its shared catalogues since a major digitisation drive began in 2019. The problem is not trivial. Redundant files have doubled storage costs at several institutions and, more critically, have made it harder for researchers and journalists to find authoritative source images in public collections.

The timing matters. The Landesarchiv's review coincides with a broader push by the Senatsverwaltung für Kultur to consolidate Berlin's fragmented digital heritage infrastructure before a 2027 deadline tied to EU funding conditions under the Digital Europe Programme. Institutions that cannot demonstrate clean, deduplicated catalogues risk losing access to the next tranche of modernisation grants.

Where the Problem Is Most Acute

Two institutions have emerged as focal points for the cleanup effort this week. The Zentral- und Landesbibliothek Berlin, whose main reading rooms sit on Breite Straße in Mitte, holds digitised collections spanning Weimar-era newspapers, postwar construction photography, and Cold War-era city planning documents. Staff there have been working with an automated deduplication tool — procured in early 2026 — to flag images sharing more than 95 percent pixel similarity. Early runs identified several thousand candidate duplicates in the newspaper photograph collection alone, according to internal process documentation circulating among partner institutions.

Across town in Tempelhof-Schöneberg, the Berlinische Galerie's digital department is running a parallel exercise on its roughly 45,000 scanned artworks and architectural images. The gallery has been coordinating with the Landesarchiv through a working group that first convened in March 2026, meeting monthly at the archive's premises on Eichborndamm in Reinickendorf. The challenge there is different: many apparent duplicates are actually legitimate variant scans — different resolutions, different crop decisions — made at different points in the digitisation workflow. Distinguishing useful variants from true redundancy requires human review, and that is where the bottleneck sits.

What the Data Reveals

A 2025 audit of Berlin's shared cultural heritage metadata repository, Kulturportal Berlin, found that approximately 18 percent of image records flagged as unique shared a near-identical binary fingerprint with at least one other record in the system. Storage costs for the combined holdings across five participating institutions ran to roughly €340,000 annually, a figure that archivists involved in the review say could fall significantly if deduplication is completed. The Kulturportal currently indexes holdings from fourteen Berlin institutions, ranging from the Akademie der Künste on Pariser Platz to the Stadtmuseum Berlin's distributed sites.

The practical stakes for ordinary users are real. A researcher at Humboldt-Universität trying to licence a specific 1930s aerial photograph of Potsdamer Platz may currently encounter the same image listed under four separate catalogue entries, each with different rights metadata attached. That confusion has generated complaints to the Landesarchiv's user services team and, in at least some documented cases, led researchers to pay licensing fees for images they effectively already had access to through a different catalogue entry.

Staff at several institutions stressed this week that the deduplication work is not about deleting historical records — every flagged image will be reviewed before any action is taken, and variant scans with archival value will be retained. The goal is to create a single canonical record with clearly attributed rights and provenance, with secondary versions subordinated beneath it rather than listed as independent entries.

For anyone using Berlin's digital archives in the coming months, the practical advice is straightforward: cross-check catalogue results across Kulturportal Berlin and the institution's own online finding aid before drawing conclusions about image availability or rights status. The deduplication work is expected to run through the fourth quarter of 2026, meaning the catalogues will be in transition for much of the year. Staff at the Zentral- und Landesbibliothek's reference desk on Breite Straße remain available for queries about specific image holdings during the process.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.