Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Duplicate Image Problem: The Numbers That Are Costing the City's Digital Archives Millions

A deep dive into the data reveals how redundant image files are piling up across Berlin's public institutions, driving up storage costs and slowing down civic tech projects.

By Berlin News Desk · Published 4 July 2026, 8:36 pm

3 min read

Berlin's Duplicate Image Problem: The Numbers That Are Costing the City's Digital Archives Millions
Photo: Katharine Susan Anthony / Public domain (Wikimedia Commons)
Wird übersetzt…

Berlin's public digital infrastructure is sitting on a quietly expensive problem. Across municipal databases, housing authority portals, and public transport information systems, duplicate image files now account for a measurable share of total storage overhead — and the bill is growing. Internal audits reviewed by The Daily Berlin show that redundant visual assets are not a marginal issue but a systemic one, with roots in fragmented procurement, legacy IT contracts, and the rapid digitisation push that accelerated after 2020.

The timing matters. Berlin's SPD-led coalition has committed to a broad e-government modernisation agenda, with the Senate Chancellery earmarking funds for digital infrastructure upgrades through 2027. At the same time, the city's chronic housing shortage has pushed agencies like the Berliner Stadtentwicklung and the landeseigene Wohnungsbaugesellschaft degewo to digitise massive property photo libraries at speed — often without standardised deduplication protocols in place before files were ingested.

What the Data Actually Shows

The scale of the problem becomes clearest when you look at individual institutions. The Berliner Verwaltungscloud, a shared cloud infrastructure rolled out incrementally since 2022, reportedly carries image duplication rates estimated by IT procurement consultants working on similar German Länder projects at between 18 and 35 percent of total visual asset libraries. Applied to Berlin's documented digital estate — which spans everything from planning documents for Alexanderplatz redevelopment to archive images for the BVG's Linie 100 bus route promotional materials — even the lower end of that range translates into significant wasted capacity.

Storage is not free. Enterprise-tier cloud storage in Germany, priced under frameworks used by public bodies, typically runs between €0.02 and €0.05 per gigabyte per month depending on redundancy tier and contract structure. A library carrying 40 terabytes of duplicate image data — a conservative estimate for a mid-sized German state authority — accumulates costs of roughly €9,600 to €24,000 annually on storage alone, before accounting for bandwidth, backup cycles, and human time spent managing conflicting file versions.

Berlin's situation is complicated by the number of separate entities involved. The Senatsverwaltung für Stadtentwicklung, the BVG, the Berliner Sparkasse's digital communications arm, and cultural institutions along the Museumsinsel each maintain independent asset management systems, few of which talk to one another. The result is that the same photograph of the Brandenburger Tor, taken during a 2023 Senat press event, may exist in four or five separate databases under different filenames, different compression levels, and different embedded metadata.

Deduplication as Infrastructure Policy

The technical fix is not complicated. Perceptual hashing — a method that identifies visually identical or near-identical images regardless of filename or minor compression differences — has been standard practice in private-sector digital asset management for over a decade. Tools built on this approach can process tens of thousands of images per hour on commodity hardware. The obstacle in Berlin, as in other German cities with federated administrative structures, is governance: who decides which copy is canonical, who bears the cost of the cleanup, and whose IT contract gets disrupted in the process.

Initiatives like the CityLAB Berlin, based on Platz der Luftbrücke in Tempelhof, have explored exactly these kinds of cross-agency data quality challenges since the lab's founding in 2019. The Technologiestiftung Berlin, which supports CityLAB, has published research on open data quality issues in the city, though image asset deduplication has not yet been the focus of a dedicated programme.

For institutions looking to act now, the practical path starts with auditing existing digital asset management systems against a single consistent hashing standard before any new files are ingested. The German federal IT standards body, the Koordinierungsstelle für IT-Standards (KoSIT), has published interoperability guidelines that provide a framework, though compliance remains voluntary for Länder-level bodies. Berlin's Senate IT directorate has until the end of the current coalition term — scheduled for the 2026-2027 budget cycle — to decide whether deduplication becomes a mandatory procurement requirement for publicly funded digital projects. The cost of not deciding is already showing up in the numbers.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.