Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Digital Archives Are Drowning in Duplicate Images — And the Numbers Tell a Damning Story

From Mitte to Marzahn, city agencies and cultural institutions are sitting on millions of redundant files, and the bill for fixing it is climbing fast.

By Berlin News Desk · Published 4 July 2026, 8:44 pm

3 min read

Berlin's Digital Archives Are Drowning in Duplicate Images — And the Numbers Tell a Damning Story
Photo: Photo by Zois Fotis on Pexels
Wird übersetzt…

Berlin's public digital infrastructure is carrying an embarrassing and expensive dead weight: duplicate images. A systematic audit of file management practices across several Senatsverwaltung departments, conducted earlier this year, found that duplicated image files account for between 30 and 40 percent of total storage consumption across surveyed government servers — a problem that has quietly compounded for over a decade and now costs the city measurably more each budget cycle.

The issue matters right now because Berlin is in the middle of a major digitisation push. The SPD-led coalition has earmarked funds under the Smart City Berlin strategy to migrate legacy document systems to cloud infrastructure before the end of 2027. Storage auditors working on that migration are now flagging that duplicate images — scanned planning documents, repeated press photographs, redundant social media assets — are inflating the true cost of that cloud transition significantly. Every unnecessary gigabyte moved to cloud storage costs money that could go elsewhere, including toward the housing and transport budgets already under political pressure.

What the Numbers Actually Show

The scale is not trivial. Across major European cities that have completed similar audits — including Amsterdam's gemeentelijke digitale archief review in 2024 — duplicate file rates in public-sector image libraries have ranged from 25 to 45 percent. Berlin's internal figures sit toward the higher end of that band. A single department managing urban planning applications for districts including Friedrichshain-Kreuzberg was found to hold the same scanned building elevation drawings saved in four or more separate folder locations on average, according to internal documentation reviewed as part of the Smart City migration scoping process.

The Landesarchiv Berlin, located on Eichborndamm in Reinickendorf, holds one of the city's largest digitised photographic collections — more than 1.2 million items as of its last public inventory. Archivists there have been working since 2023 with deduplication software to clean historical image records before they are transferred to a new repository system. The process identified redundancy rates of roughly 28 percent in certain collection segments, meaning nearly one in three image files was a functional copy of another already in the system.

At the Zentralbibliothek at Breite Straße in Mitte, digital librarians managing the city's public e-media catalogue have used automated hash-matching tools since late 2024 to flag duplicates before ingestion. The process, which cross-references file metadata and pixel-level checksums, cut the library's new acquisition processing time by an estimated 18 percent in its first operational year — a concrete efficiency gain that administrators there have pointed to as a model for other agencies.

Why Deduplication Has Lagged, and What Comes Next

Several structural factors explain why Berlin's public bodies accumulated so many duplicate images in the first place. Decentralised file management, with each Bezirk operating semi-independently, meant there was no single standard for how images were named, stored or checked for redundancy before the Smart City initiative began harmonising those practices. Staff turnover in IT departments, combined with a long period of austerity-constrained IT budgets roughly between 2012 and 2020, meant deduplication tools were a low priority.

Cloud storage costs are the forcing function now. Standard enterprise cloud storage for public-sector clients in Germany is currently priced at roughly €0.02 to €0.025 per gigabyte per month. At that rate, even a modest reduction of 500 terabytes of duplicate image data — a realistic target for a city the size of Berlin — would represent annual savings of around €120,000 to €150,000. That is not a transformative budget line, but it is money that does not need to be spent.

The practical path forward involves two things running in parallel. Agencies that have not yet adopted automated hash-based deduplication tools should do so before the 2027 cloud migration deadline, according to the Smart City Berlin technical framework published in March 2026. And procurement officers writing new contracts for digital asset management systems should require built-in duplicate detection as a baseline specification. For Berlin's dozens of cultural institutions still managing image libraries manually — from the Stadtmuseum to district archive offices in Spandau and Lichtenberg — that shift will take both training budgets and political will to actually deliver.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.