Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Damning Story

From Mitte to Marzahn, public institutions are sitting on terabytes of redundant visual data, and the cost of cleaning it up is climbing fast.

By Berlin News Desk · Published 4 July 2026, 9:00 pm

3 min read

Berlin's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Damning Story
Photo: Photo by Korkut Mamet on Pexels
Wird übersetzt…

Berlin's public sector holds an estimated 40 million digital image files across its municipal databases — and data audits conducted by the Berlin Senate Department for Digital Transformation in the first quarter of 2026 indicate that somewhere between 18 and 22 percent of those files are duplicates. That is roughly eight million redundant images consuming server space, slowing retrieval systems, and costing taxpayers money every month in storage contracts that renew automatically.

The issue has sharpened this summer because three major digitisation deadlines are converging at once. The Stadtbibliothek Berlin network completed its final phase of newspaper archive scanning in March. The Stadtmuseum Berlin, headquartered near Alexanderplatz, finished ingesting roughly 300,000 object photographs from its Märkisches Museum collection by June 30. And the Landesarchiv Berlin, based on Eichborndamm in Reinickendorf, is midway through a 2025–2027 federal grant programme that is digitising civil records going back to the nineteenth century. Each project feeds into shared infrastructure — and each one has compounded the duplicate problem rather than relieved it.

What Duplicate Images Actually Cost

Storage costs sound trivial until they accumulate. Cloud and hybrid-storage contracts for Berlin's Senate IT service provider, ITDZ Berlin, run on tiered pricing. Industry benchmarks for comparable European municipal cloud agreements place per-terabyte annual costs between €200 and €380, depending on access frequency and redundancy requirements. If Berlin's municipal image holdings run to approximately 600 terabytes — a figure consistent with the Senate's published infrastructure reports for 2024 — even conservative estimates suggest that duplicate data alone is costing the city between €200,000 and €400,000 per year in avoidable storage fees. That does not count staff hours spent managing, tagging, or searching across bloated repositories.

The human cost is harder to quantify but easier to illustrate. Archivists at the Zentral- und Landesbibliothek Berlin on Blücherplatz in Kreuzberg have described — in public conference presentations, not to this reporter directly — workflows in which the same historical photograph of the Tempelhof airfield appears under four or five different filenames, each tagged differently, each indexed in a different catalogue layer. A researcher submitting a request can receive conflicting metadata for what is, pixel for pixel, the same image. Quality control requires human review of every hit.

Automated deduplication software has existed for years, but adoption in Berlin's public sector has been patchy. The tools most widely evaluated by German municipal IT departments — including perceptual hashing and vector-similarity matching — can identify visually identical or near-identical images with accuracy rates above 95 percent on test datasets, according to findings published by the Fraunhofer Institute for Digital Media Technology. The sticking point is not technical. It is governance: who owns the decision to delete or merge a record, particularly when that record is a historical document with potential legal or cultural significance?

What Comes Next for Berlin's Image Infrastructure

The Senate Department for Digital Transformation has flagged duplicate-image management as a line item in its 2026–2028 Smart City Strategy update, which is due for committee review in September. That strategy document, circulated in draft form to coalition partners in May, proposes a centralised metadata clearinghouse — essentially a master index that would sit above individual departmental databases and flag conflicts before they embed themselves in long-term archives.

Practically, institutions holding large image collections should act before the September review locks in priorities. The Stadtmuseum and the Landesarchiv both have internal digitisation working groups that could, in principle, pilot deduplication audits on recent ingest batches before the year-end budget cycle. Any institution that digitised materials after January 2024 under the federal Digitalisierungsprogramm Archiv grant scheme is already required to submit a data-quality report to the Bundesarchiv by December 31, 2026 — a deadline that makes this summer the last realistic window for catching and correcting duplication before it becomes a compliance problem rather than merely an efficiency one.

Eight million redundant files is not an abstraction. It is a governance failure that has a price tag, a deadline, and, for the first time in this city's digital history, a political calendar that might actually force a fix.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.