Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Duplicate Image Problem: The Numbers Exposing a Digital Archive Crisis

Municipal databases and cultural institutions across the capital are sitting on millions of redundant image files — and the cost of doing nothing is rising fast.

By Berlin News Desk · Published 4 July 2026, 8:40 pm

3 min read

Wird übersetzt…

Berlin's public institutions are drowning in duplicate image data. An internal audit completed in May 2026 by the Senate Department for Digital Affairs found that city-managed digital repositories contained an estimated 34 percent redundancy rate across image file inventories — meaning roughly one in three stored images is a copy of something already catalogued elsewhere in the system. The finding has accelerated a long-delayed conversation about what it costs to store bad data, and who pays for it.

The timing matters. Berlin's Senate approved a €2.1 billion digital infrastructure spending plan in March 2026, with image data management buried inside a broader push to modernise municipal IT before the 2028 administrative efficiency targets set under the federal Digital-Pakt-Kommunen framework. With that money now moving through procurement channels, institutions that have not cleaned their archives risk locking in their inefficiencies at scale — essentially paying to migrate clutter into expensive new infrastructure.

Where the Problem Is Concentrated

Two institutions illustrate the scale particularly well. The Landesarchiv Berlin, based on Eichborndamm in Reinickendorf, manages more than 12 million digitised items spanning photography collections dating to the late nineteenth century. Staff there have flagged that automated ingestion tools used between 2018 and 2023 routinely created derivative image files without cross-referencing the existing catalogue, stacking near-identical versions of the same photograph across multiple storage nodes. The Stadtmuseum Berlin, whose collections are spread across sites including the Ephraim-Palais in Mitte and the Märkisches Museum near Köllnischer Park, faces a comparable situation after a pan-collection digitisation drive accelerated during the Covid-19 closure years added volume faster than deduplication workflows could handle.

Beyond the cultural sector, the problem runs through Berlin's urban planning apparatus. The Senatsverwaltung für Stadtentwicklung, Bauen und Wohnen holds aerial survey photography and site documentation images for every major development zone in the city, from the Tempelhofer Feld planning corridor to the ongoing redevelopment around Lichtenberg's Frankfurter Allee Nord district. Redundant imagery in those planning files is not merely a storage inconvenience — it creates version-control risks when planners pull documentation for active building consent decisions.

The Cost of Redundancy, in Hard Numbers

Storage is not free, and the numbers make that plain. Enterprise cold-storage contracts of the kind Berlin's Senate IT division operates typically run between €0.004 and €0.008 per gigabyte per month on current European cloud-infrastructure pricing benchmarks. A 34 percent redundancy rate across a repository measured in petabytes translates into recurring six-figure annual waste before staff time is even counted. The Senate's own March 2026 budget documents, publicly available through the Abgeordnetenhaus parliamentary portal, allocate €18 million specifically to data quality remediation through 2028 — a figure that critics on the digital affairs committee have argued underestimates the true scope of the problem by at least a third.

Deduplication software vendors have been pitching Berlin institutions hard this year. Tools from companies including open-source projects maintained through the Wikimedia Deutschland infrastructure in Prenzlauer Berg and proprietary platforms marketed under the EU's Gaia-X data sovereignty standards can now identify near-duplicate images — not just exact binary matches — using perceptual hashing algorithms that compare visual content rather than file metadata alone. That distinction matters enormously for archive work, where the same historical photograph may exist in a lossless TIFF, a compressed JPEG, and a watermarked web version, all treated as separate files by older catalogue systems.

For institutions that move now, the remediation path is clearer than it was two years ago. The Senate's Digital Affairs office has published a procurement guidance note, updated in June 2026, directing city bodies to run deduplication audits before any data migration contract is signed under the new infrastructure spending plan. Institutions that miss that window face a harder conversation later: migrating duplicate-heavy archives into new systems typically costs significantly more to unpick retroactively than to clean beforehand. The Landesarchiv has reportedly begun a pilot deduplication run on its post-1990 photographic holdings — roughly 2.4 million files — with results expected to inform a citywide methodology by the fourth quarter of 2026. The wider archive sector is watching that pilot closely.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.