Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

How Berlin's Digital Archives Ended Up Drowning in Duplicate Images — and What's Being Done About It

Years of fragmented digitisation projects across city departments left Berlin's public image databases bloated, redundant, and increasingly unusable — here's how it happened.

By Berlin News Desk · Published 4 July 2026, 8:58 pm

3 min read

How Berlin's Digital Archives Ended Up Drowning in Duplicate Images — and What's Being Done About It
Photo: Edgeworth, Edward / Public domain (Wikimedia Commons)
Wird übersetzt…

Berlin's Senate Department for Culture and Social Cohesion confirmed this week that an estimated 40 percent of images held across the city's public digital archives are duplicates — identical or near-identical files stored multiple times under different filenames, in different systems, at different resolutions. The figure comes from an internal audit completed in June 2026 covering six major institutions, including the Zentral- und Landesbibliothek Berlin on Breite Straße and the Stadtmuseum Berlin network. The cleanup project, now formally underway, carries a budget of €2.3 million and is expected to run through the first quarter of 2027.

The timing matters. Berlin is mid-way through its Digital Stadt Berlin strategy, a framework the SPD-led coalition committed to in 2023 with the aim of consolidating the city's notoriously siloed administrative IT infrastructure. Duplicate image data is not merely a storage inconvenience — it degrades search results in public-facing portals, inflates server costs, and creates legal headaches when licensing metadata attached to one copy of a file contradicts metadata on another version of the same photograph. For a city that has spent heavily positioning itself as a European tech hub, the mess is an embarrassing structural inheritance.

A Decade of Disconnected Digitisation

The roots of the problem stretch back to roughly 2012, when individual Berlin districts and cultural institutions began digitising their collections independently, without a shared technical standard. Mitte's district archive ran one content management system. The Stadtbibliothek network in Friedrichshain-Kreuzberg ran another. The Humboldt Forum, which opened in stages between 2020 and 2021, imported assets from at least four predecessor institutions when building its own digital collection, each transfer generating fresh duplicate sets. By 2019, the Landesarchiv Berlin on Eichborndamm in Reinickendorf alone held over 1.2 million digitised image files, with internal estimates suggesting roughly a third were redundant copies created during successive migration cycles.

No single administrator owned the cross-institutional problem. Responsibility was spread across the Senate's Department for Culture, the individual Bezirksämter, and the separately governed foundation boards of major museums. Each institution had its own procurement cycle, its own IT vendor relationships, and its own definition of what constituted an archival master file versus a derivative copy. A photograph of the Brandenburger Tor taken in 1987 might exist simultaneously as a 600-dpi TIFF in one system, a compressed JPEG in a second, and a watermarked web preview in a third — each logged as a discrete asset.

The Automated Fix — and Its Limits

The current remediation effort relies on perceptual hashing technology, a method that generates a short digital fingerprint for each image based on visual content rather than filename or file size, allowing near-identical images to be flagged even when they have been cropped or re-exported. The contract to deploy the system was awarded in March 2026 to a consortium including the Fraunhofer Institute for Digital Media Technology, which has a Berlin office in Adlershof. The technology can process approximately 80,000 images per day, which gives some sense of the scale involved — the full audit scope covers an estimated 6.5 million files.

Human review cannot be bypassed entirely. Archivists at each institution must confirm deletion recommendations before files are removed, because perceptual hashing occasionally flags historically distinct photographs as duplicates when they depict the same subject from the same angle on different dates. That manual verification layer is where most of the €2.3 million budget is actually going — roughly 60 percent to staffing costs, according to the Senate's project outline published on June 18.

For institutions and researchers who rely on Berlin's public portals, the practical advice is straightforward: hold off on bulk downloads from the Deutsche Digitale Bibliothek's Berlin-specific collections until after the first reconciliation phase completes, projected for October 2026. The Senate's digital office has said metadata quality — licensing terms in particular — will be unreliable in some collections until that phase is signed off. After October, a consolidated public search interface is planned that will draw from a single deduplicated master repository for the first time, replacing the current arrangement where users must search each institution's portal separately and routinely encounter the same image returned multiple times under different catalogue numbers.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.