Berlin's Senate Department for Digital Development and Work confirmed this week that a formal review of duplicate image data across city-administered databases is now underway, targeting everything from housing portal listings to public transport route maps published by the BVG. The cleanup effort, which began in earnest on Monday, June 30, follows months of complaints from city contractors and civic tech developers about redundant files slowing down public-facing digital services.
The timing is not arbitrary. Berlin's coalition under SPD leadership has staked a significant portion of its digital governance agenda on making the city's data infrastructure fit for the next phase of the Smart City Berlin strategy. Duplicate images — identical or near-identical files stored multiple times across disconnected systems — have emerged as a concrete, measurable drag on that ambition. For a city positioning itself as a European tech hub alongside startup clusters in Mitte and Prenzlauer Berg, the problem carries real reputational weight.
Where the Problem Is Concentrated
The worst bottlenecks, according to the Senate's internal review documents circulated this week, sit inside the Berlin Open Data portal at daten.berlin.de and the digital inventory systems used by the Stadtentwicklung — the urban development arm responsible for housing permit records. Both platforms grew rapidly during the 2020–2023 pandemic-era digitisation push and were fed image assets from dozens of separate agencies without any centralised deduplication protocol. The result: some building permit files contain the same facade photograph stored up to eleven times under different filename conventions.
The BVG's digital communications team has also been flagged. The public transport operator, which serves roughly 1.1 billion passenger journeys per year across its U-Bahn, S-Bahn, tram, and bus network, maintains a media library for route visualisations and station imagery that auditors found contained substantial redundancy, particularly around major interchange hubs like Alexanderplatz and Ostbahnhof.
Berlin-based startup Metatagger GmbH, headquartered in a co-working facility on Oranienstraße in Kreuzberg, has been brought in as a technical consultant on the deduplication process. The firm specialises in automated metadata cleaning for public sector clients and has previously worked on similar projects in Hamburg and Leipzig.
What the Data Shows — and What Comes Next
The scale is significant. Preliminary figures from the Senate review, shared with the Digital Advisory Council on July 2, indicate that removable duplicate image files account for an estimated 340 terabytes of redundant storage across the city's top-tier government platforms. At current cloud storage contract rates — Berlin's primary public cloud deal runs at roughly €0.023 per gigabyte per month — the annual cost of retaining that redundant data runs into six figures.
Beyond pure cost, the performance argument is gaining traction among developers. Civic tech groups including Code for Berlin, which holds regular open-data meetups at locations including the co-working space Supermarkt on Eberswalder Straße in Prenzlauer Berg, have long flagged that bloated image libraries slow API response times on public datasets, making third-party app development for Berlin services more difficult than it needs to be.
The Senate has set a completion target of October 31 for the first phase of deduplication, covering the Open Data portal and the Stadtentwicklung permit system. A second phase, encompassing BVG's media library and the city's tourism and events databases managed through visitBerlin, is scheduled to follow in early 2027.
For residents and developers who rely on these platforms, the practical upshot is gradual. Faster load times on the housing portal listings — a particularly sore point for Berliners navigating the city's chronic rental shortage — may be noticeable by autumn. Developers building on the Open Data API are being advised to re-index their data pulls after October 31, when file identifiers will be reassigned as part of the cleanup. The Digital Development Department says updated technical documentation will be published to daten.berlin.de no later than two weeks before the switchover.