Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Duplicate Image Problem: The Numbers Driving a City-Wide Digital Clean-Up

Municipal databases, housing portals and public transit apps are riddled with repeated photographs — and the scale of the redundancy is larger than most Berliners realise.

By Berlin News Desk · Published 4 July 2026, 8:44 pm

3 min read

Berlin's Duplicate Image Problem: The Numbers Driving a City-Wide Digital Clean-Up
Photo: Photo by Irina Nesterenko on Pexels
Wird übersetzt…

Berlin's sprawling network of public-facing digital platforms contains tens of thousands of duplicate images, a problem that is quietly costing the city money, slowing government websites and frustrating the developers tasked with maintaining them. According to an internal audit circulated within the Senate Department for Urban Development and Housing earlier this year, roughly 34 percent of image assets stored across the department's property and housing portals are exact or near-exact duplicates — files uploaded multiple times under different filenames but pointing to identical visual content.

The timing matters. The SPD-led coalition has staked significant political capital on the Wohnraumversorgung Berlin programme, which depends on clean, fast-loading listings on the city's social housing platform to match tenants with available flats. Duplicate images inflate database sizes, slow search indexing and, in at least a dozen documented cases flagged internally, caused incorrect property photographs to appear alongside the wrong listings — a bureaucratic failure with real consequences for tenants applying for flats in Marzahn-Hellersdorf and Spandau.

What the Data Actually Shows

The Senate audit — which covered image repositories across four major platforms between January 2024 and March 2026 — found the average duplicated image file sits at 2.3 megabytes. Across an estimated 18,000 redundant files identified, that represents roughly 41 gigabytes of avoidable storage. Cloud storage costs the Senate's IT division approximately €0.023 per gigabyte per month on its current municipal contract with a Frankfurt-based provider, making the raw storage bill a minor issue. The real expense is bandwidth: duplicate image calls add measurable latency to portal load times, which the audit measured at an average of 340 milliseconds of additional delay per affected page — enough to push bounce rates higher on mobile connections.

BVG, the public transport operator, faces a parallel problem. The Fahrinfo app and the BVG.de route planner both draw on a shared media library that, as of a review completed in February 2026, contained 7,200 station and vehicle images, of which an estimated 2,900 were flagged as probable duplicates by an automated perceptual hashing tool the operator's IT team deployed. Stations including Alexanderplatz, Potsdamer Platz and Hermannstraße each had more than 15 separate image files that were functionally identical, uploaded by different staff members over several years without a centralised naming convention.

The startup ecosystem along Rosenthaler Straße in Mitte and around the Factory Berlin campus in Prenzlauer Berg has noticed the same structural gap at a commercial level. Several proptech firms building rental tools for the Berlin market have reported that listings scraped from public portals carry duplicate image metadata that pollutes their own machine-learning training sets — a downstream consequence that degrades the accuracy of automated property-valuation tools.

Fixes on the Table — and What They Cost

The Senate Department is evaluating two approaches. The first is a one-time manual review, estimated at 1,200 staff hours across the urban development and IT departments. The second is deploying a persistent deduplication pipeline using perceptual hashing — a method that generates a short fingerprint for each image and compares it against existing fingerprints rather than file names — at an estimated implementation cost of between €40,000 and €65,000 for a Berlin-scale deployment, based on comparable municipal projects in Hamburg and Vienna completed in 2024 and 2025 respectively.

BVG is leaning toward the automated route. The operator's media library is set to expand significantly as new U-Bahn stations open along the U5 extension and as the planned tram lines through Lichtenberg are documented photographically from construction through to operation. Getting the database architecture right before that expansion hits is the practical argument for acting in 2026 rather than later.

For residents and developers working with Berlin's open data portal at daten.berlin.de, the immediate practical step is straightforward: filter image downloads by file hash rather than filename, and report confirmed duplicates through the portal's feedback mechanism. The Senate's digital team has confirmed it reviews those reports quarterly. The next review window opens in September 2026.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.