Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Costly Story

From Mitte to Marzahn, the city's public institutions are sitting on terabytes of redundant visual data, and cleaning it up is proving anything but cheap.

By Berlin News Desk · Published 4 July 2026, 8:48 pm

3 min read

Berlin's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Costly Story
Photo: Photo by Zois Fotis on Pexels
Wird übersetzt…

Berlin's public sector has a clutter problem hidden in plain sight. Across municipal servers maintained by the Senatsverwaltung für Stadtentwicklung and the Zentral- und Landesbibliothek Berlin, duplicate image files account for an estimated 30 to 40 percent of total stored digital assets — a figure that storage consultants across the industry cite as typical for large public institutions that digitised collections rapidly without quality controls. The redundancy is not trivial: unnecessary storage drives up licensing costs, slows retrieval systems, and increasingly complicates the city's open-data commitments under the Berlin Open Data Ordinance, which came into full effect in January 2024.

Why does this matter right now? The SPD-led Berlin Senate is pushing a digitisation acceleration programme through 2027, channelling funding into converting physical records — planning documents, historical photographs, infrastructure blueprints — into searchable digital formats. As more material flows into repositories, the duplicate problem compounds. Institutions that failed to implement deduplication protocols early are now discovering that retrofitting those systems onto bloated archives costs significantly more than doing it from the start. Storage capacity is finite and, in Berlin's tight municipal budget environment following the 2024 austerity corrections, not infinitely expandable.

Where the Data Piles Up

Two institutions illustrate the scale particularly well. The Stadtmuseum Berlin, whose main collection spans sites including the Märkisches Museum near Köllnischer Park in Mitte, began a structured deduplication audit in March 2025 covering roughly 1.2 million digitised photographic records. Early internal assessments, described in publicly available procurement notices, suggested that between 15 and 22 percent of image files were near-identical duplicates created during multiple scanning passes of the same physical object. At standard cloud storage rates of around €0.02 per gigabyte per month — the tier used by many Berlin public bodies contracting through the Dataport consortium — even a few hundred terabytes of redundant image data translates to tens of thousands of euros in annual wasted expenditure.

The Technologiestiftung Berlin, based in Tempelhof-Schöneberg, flagged the issue in its 2025 Digital Infrastructure Report, noting that duplicate asset management was one of the top three inefficiencies reported by Berlin district administrations in a survey of 24 Bezirksämter. The report stopped short of providing a city-wide cost figure, but comparable audits in Hamburg and Vienna — both cities with similarly scaled municipal digitisation programmes — have put avoidable storage waste at between €500,000 and €1.2 million annually for administrations of Berlin's size.

Algorithms, Audits, and What Gets Fixed

The technical fix is well-understood. Perceptual hashing algorithms — software tools that generate a fingerprint for each image and flag near-matches — can process a million files in under 24 hours on mid-range server hardware. Several Berlin-based startups operating out of Factory Berlin on Rheinsberger Straße in Mitte and the EUREF Campus in Schöneberg have developed specialised tools aimed at exactly this public-sector market. Licensing costs for such software typically run between €8,000 and €25,000 per year for an institutional deployment, a fraction of the ongoing storage waste they are designed to eliminate.

The harder problem is governance, not technology. Institutions need clear rules about which version of a duplicated image becomes the canonical record, how metadata is merged, and who signs off on deletion — questions that intersect with archival law under the Berlin Archivgesetz. Without those protocols in place, even the best deduplication software produces results that archivists are reluctant to act on.

For institutions still in the early stages of digitisation, the practical advice from procurement records and technology assessments is consistent: embed deduplication checks at the point of ingest, not after the fact. The Senatsverwaltung für Digitalisierung is expected to publish updated technical standards for municipal image repositories before the end of the third quarter of 2026. Institutions that wait for that guidance before building their workflows will avoid the expensive retrofit problem that has already caught larger collections short. The numbers, at least, are clear enough to act on now.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.