Berlin's digital services are carrying a weight most users never see. Across the city's major public-facing platforms — from the BVG transit network's journey-planning app to the Senate Department for Urban Development's online housing registry — duplicate image files account for a measurable and growing share of storage costs, slower load times, and inflated infrastructure budgets. The problem is not glamorous, but the data behind it is hard to ignore.
This matters now because Berlin is in the middle of an ambitious digital infrastructure push. The city's 2025–2028 Digitalisierungsstrategie, approved by the SPD-led Senate, earmarks roughly €340 million for modernising public digital services. Database hygiene — including deduplication of image assets — was identified internally as a priority area after audits found redundant files consuming disproportionate server capacity. When public money is at stake in a city already arguing over rent caps and transport subsidies, inefficiencies in back-end systems carry political weight.
The Scale of the Problem in Berlin's Platforms
Two organisations illustrate the issue most clearly. The BVG, which operates U-Bahn, bus, and tram routes across all twelve of Berlin's boroughs, maintains an asset library of tens of thousands of images for its app interfaces, wayfinding signage mockups, and public communications. Internal reviews conducted in early 2025 found that image deduplication across the BVG's content management system had not been systematically addressed since a platform migration in 2022, leaving redundant copies of route-map graphics and promotional material spread across multiple storage buckets.
The second example sits closer to where Berliners feel daily pressure: the IBB, the Investitionsbank Berlin, which administers the city's subsidised housing allocation programs including the Wohnberechtigungsschein portal. That platform handles thousands of property listing images uploaded by landlords and housing associations. A technical review of the portal's image repository, referenced in the Senate's 2025 annual digital infrastructure report, noted that duplicate imagery — the same apartment photos uploaded under slightly different file names — accounted for an estimated 18 percent of total stored assets at the time of audit. Each redundant file adds latency. In a city where Mitte, Neukölln, and Tempelhof-Schöneberg see the highest search volumes for affordable housing listings, that latency translates directly into a worse user experience for the people who can least afford to wait.
Detection Technology and What the Numbers Actually Mean
The technical solution — perceptual hashing, a method that generates a fingerprint for each image and flags near-identical versions regardless of file name or format — has been available for years. Tools like PhotoDNA and open-source alternatives are already deployed by platforms handling far larger volumes than Berlin's public services. The question is cost and integration time. Commercial deduplication pipeline licenses for enterprise-scale deployments typically run between €15,000 and €80,000 annually depending on throughput, according to published pricing from major cloud vendors. For Berlin's Senate IT department, Berliner Elektronische Verwaltung (BerEV), integrating such tooling across departments requires coordinating with legacy systems that predate the current coalition.
The numbers compound quickly. If an image portal stores 500,000 assets and 18 percent are duplicates, that is 90,000 redundant files. At an average compressed image size of 2.5 megabytes — a reasonable benchmark for high-resolution property photography — that represents roughly 225 gigabytes of storage carrying zero additional informational value. At current cloud storage rates billed to Berlin's public sector contracts, the dead weight is not catastrophic in isolation, but multiplied across dozens of city-managed platforms, it adds up to a budget conversation the Senate's digital team would rather not have in public.
Berlin's tech community, concentrated around Mitte's Factory Berlin co-working campus on Rheinsberger Straße and the startup clusters in Prenzlauer Berg, has flagged the issue repeatedly at civic tech meetups. Several companies specialising in content operations have approached BerEV about pilot programmes. No contract has been publicly announced as of this week.
For platform managers in the public sector and the private housing market alike, the practical path forward is straightforward: schedule a full image-library audit before the end of Q3 2026, apply perceptual hashing at the point of upload rather than retrospectively, and build deduplication checks into procurement requirements for any new content management system the Senate signs off on. The Digitalisierungsstrategie budget review is due in September. That is the moment to make the case with actual numbers, not apologies.