Berlin's Senate Department for Urban Development and Housing confirmed earlier this year that a citywide audit of public digital asset libraries had uncovered tens of thousands of duplicate image files spread across at least fourteen municipal databases. The problem did not appear overnight. It accumulated across nearly two decades of disconnected digitisation drives, rushed procurement cycles and a persistent failure to enforce common file-naming standards across Bezirke.
The timing matters. Berlin is in the middle of a broader push to modernise its public-facing digital infrastructure, partly driven by the coalition agreement signed by the SPD-led Senate in late 2023, which earmarked funds for what officials described as a unified smart-city data backbone. Duplicate image files are not merely a storage annoyance — they slow content management systems, inflate licensing costs for stock imagery and create versioning chaos when planners, communications teams and transport authorities all pull from separate pools without knowing which copy is current.
How the Duplication Happened
Trace the problem back to roughly 2007 and 2008, when individual Bezirke began their own digitisation programmes without a central registry. Mitte and Friedrichshain-Kreuzberg, both with active tourism and urban-development communication teams, built separate image archives that frequently drew from the same press releases issued by Senatsverwaltung departments. When the BVG — the city's public transport operator — launched its own communications overhaul around 2014 and 2015, it imported thousands of images from third-party agencies, many of which overlapped with files already held by Berliner Morgenpost's syndication partners and reused inside city hall portals.
The Koordinierungsstelle für IT (ITDZ Berlin), the state-owned technology service provider based in Mitte, was nominally responsible for setting data standards across departments. But enforcement was patchy. Without a mandatory metadata schema, files arrived in systems labelled inconsistently — the same aerial photograph of Tempelhofer Feld might exist under five different filenames across three different servers, each with a slightly different compression level and no shared identifier.
The Wikimedia Deutschland office on Tempelhofer Ufer and the Zentralarchiv der Staatlichen Museen zu Berlin both flagged the issue separately in internal working groups during 2021, noting that public-domain images being uploaded to open platforms were generating flagged duplicates at rates that disrupted community review queues. Neither institution had the mandate to fix what was, fundamentally, a city-government data-governance problem.
What a Fix Actually Looks Like
The current remediation effort is being coordinated through ITDZ Berlin in partnership with a procurement framework managed under the Senate Chancellery. The approach involves deploying perceptual hashing — a technique that identifies visually identical or near-identical images regardless of filename or format — across participating databases. Pilot runs completed in the first quarter of 2026 reportedly processed around 400,000 files across four departments, flagging roughly 60,000 as candidates for consolidation or deletion, according to documentation circulated at a February working group session obtained by The Daily Berlin.
Cost is a real factor. Cloud storage in Berlin's municipal environment runs under long-term contracts with tiered pricing; unnecessary duplication across hot-storage tiers adds up. Beyond storage, each department that holds redundant files must also clear licensing provenance separately — a legal bottleneck that the Senate's legal services unit, Senatsverwaltung für Justiz, has been asked to help streamline through a standardised clearance template.
For residents and businesses interacting with city portals — whether filing planning applications through the FIS-Broker geoinformation platform or accessing press image libraries for community journalism — the practical effect of duplicate replacement should eventually be faster-loading pages and fewer broken image links. Departments have been given a target of completing primary deduplication by the end of the third quarter of 2026. Whether the underlying metadata standards get locked down before the next round of procurement contracts begin in early 2027 will determine whether Berlin finds itself, in another decade, counting duplicates all over again.