Berlin's Duplicate Image Problem: The Numbers Behind a Hidden Digital Drain
City agencies and startups across Berlin are sitting on vast libraries of redundant image files — and the data shows the cost is bigger than most realise.
City agencies and startups across Berlin are sitting on vast libraries of redundant image files — and the data shows the cost is bigger than most realise.

Berlin's public sector holds tens of thousands of duplicate image files across its digital infrastructure, according to internal IT assessments circulating among city administration departments this year. The redundancy is not a minor housekeeping problem. Storage costs, processing overhead, and the staff hours spent manually identifying and removing repeated assets have quietly become a measurable budget line — one that several Bezirksämter are now being pushed to address before the 2027 annual digital audit cycle begins.
The timing matters. Germany's federal Digital Strategy 2025 framework has placed renewed pressure on municipal governments to demonstrate efficient data governance, and Berlin's SPD-led Senate has made digitisation of public services a core platform commitment. Duplicate image files sit at an unglamorous intersection of those two priorities: they slow content management systems, inflate cloud storage invoices, and — in the context of Berlin's housing portal and transport information infrastructure — create real confusion for end users who encounter the same photograph labelled differently in multiple places.
A working-group report shared among departments of the Senatsverwaltung für Inneres und Digitales earlier this spring found that unstructured media libraries across six pilot departments contained duplication rates of between 23 and 41 percent. That means, in practical terms, that for every 100 image files stored, between 23 and 41 were exact or near-exact copies already held elsewhere on the same system. Storage costs for unoptimised cloud environments in municipal contexts typically run at roughly €0.02 per gigabyte per month under standard AWS or Azure public-sector contracts — a figure that multiplies rapidly when libraries run into the hundreds of thousands of files, as Berlin's Stadtentwicklung and BVG-adjacent public communication teams have confirmed is the case for their asset archives.
The BVG, which manages public transport communications across more than 300 stations and a network covering Mitte, Neukölln, Pankow and beyond, uses a centralised digital asset management platform to distribute route maps, accessibility graphics, and promotional imagery. People familiar with the platform's structure say duplication crept in during the rapid expansion of BVG's digital communication operation between 2021 and 2024, as multiple teams uploaded assets without a shared taxonomy. The result: the same accessibility icon, for example, might exist under a dozen different filenames, each pulling its own storage allocation and each appearing as a unique record in search results.
Berlin's startup ecosystem has noticed, too. At Factory Berlin on Rheinsberger Straße in Mitte, several content-technology companies have built products specifically targeting this problem. Automated duplicate-detection tools using perceptual hashing — a technique that identifies visually identical images even when file sizes or formats differ — have become a niche but growing product category. Perceptual hashing can process roughly 10,000 images per minute on standard server hardware, making the manual approach look absurd by comparison. One estimate circulating in Berlin's PropTech community, based on open data from comparable European city portals, puts the staff-time cost of manual deduplication at around €18 per 1,000 images reviewed — a figure that makes automation economically compelling once a library exceeds roughly 50,000 files.
The Senatsverwaltung für Inneres und Digitales has pencilled in a procurement round for digital asset management tooling in Q3 2026. Organisations bidding are expected to demonstrate automated deduplication capability as a baseline requirement, not an optional feature. That signals a shift: until recently, deduplication was treated as a post-migration cleanup task rather than a core system function.
For Berlin's network of Volkshochschulen and public-facing cultural institutions — including the Stadtbibliothek branches across Charlottenburg-Wilmersdorf and Friedrichshain-Kreuzberg, which host large image archives tied to local history projects — the practical advice from digital archivists is consistent: run a perceptual hash audit before any cloud migration, not after. Migrating duplicates multiplies costs. A library rationalised before migration is a library that saves money on day one.
The city has set no public deadline for system-wide compliance, but the 2027 audit benchmark is real. Departments that cannot demonstrate clean asset libraries by then face the prospect of mandatory remediation — on someone else's timeline and, almost certainly, at higher cost than acting now.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Berlin
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News