Berlin's public administration is sitting on a problem it has largely avoided naming out loud: tens of thousands of duplicate images clogging the shared digital archives used by district offices, the Senatsverwaltung für Stadtentwicklung, and public-facing portals from Mitte to Neukölln. The question of what to do with them — and who pays — is now forcing a reckoning that administrators can no longer defer.
The timing matters because the city is mid-way through a €47 million digitisation push under the Berlin Digital Strategy 2025–2030, a framework approved by the SPD-led Senate to consolidate municipal data infrastructure. That programme has accelerated the ingestion of legacy files, including scanned planning documents, construction permits, and neighbourhood mapping surveys, into centralised repositories. The faster the intake, the worse the duplication problem gets without a coherent deduplication policy in place.
What the Backlog Actually Looks Like
The Landesarchiv Berlin, housed on Eichborndamm in Reinickendorf, holds the city's official photographic and documentary record. Staff there have flagged internally that migrated files from district servers frequently arrive with multiple copies of identical or near-identical images, each carrying slightly different metadata tags — different upload dates, different file names, sometimes different licensing fields. That inconsistency is not merely a storage issue. It creates legal ambiguity around copyright, usage rights, and attribution, particularly for images sourced from third-party photographers under time-limited contracts.
The Berlin Open Data portal, operated under the Senatsverwaltung für Inneres und Digitalisierung, currently lists more than 3,200 datasets available to the public. Image-heavy datasets — aerial photography, street-level surveys of districts like Friedrichshain-Kreuzberg, and construction site documentation along the A100 extension corridor — account for a disproportionate share of the storage burden. Estimates from comparable European municipal digitisation projects, including Hamburg's Digitale Stadt initiative, suggest duplicate imagery can represent between 15 and 30 percent of total file volume in archives that lack automated deduplication at the point of ingestion.
The cost is not trivial. Cloud storage contracts for Berlin's municipal data, routed partly through the ITDZ Berlin — the city's own IT service provider based in Wedding — are billed incrementally. Redundant files mean redundant costs, compounded each year the problem goes unaddressed.
The Decisions Coming This Autumn
Three choices now sit in front of the relevant Senate departments, and each carries political as well as technical weight.
First, the city must decide whether to deploy automated hash-matching software to identify pixel-identical duplicates, or invest in the more expensive AI-assisted perceptual hashing that can catch near-duplicates — the same photograph uploaded in slightly different resolutions or crop ratios. The latter is more effective but requires procurement under public tender rules, adding months to any timeline.
Second, there is the question of which agency leads. The ITDZ Berlin has the technical capacity but not the archival authority. The Landesarchiv has the mandate but not the engineering staff. A working group involving both bodies, alongside the Senatsverwaltung für Kultur, is expected to meet for the first time in September 2026. Whether that group produces binding policy or another consultation report is genuinely uncertain.
Third — and most contentiously — the city must decide what happens to images that exist in multiple versions with conflicting rights metadata. Deleting the wrong version could mean losing the only copy with a valid licence. Keeping all versions defeats the purpose. Legal counsel from the Berliner Beauftragter für Datenschutz und Informationsfreiheit will likely be sought before any mass deletion is authorised.
For Berliners, the practical stakes run from the mundane to the significant. Planners in Lichtenberg relying on accurate aerial surveys of development zones need clean, attributable imagery. Journalists filing Freedom of Information requests for urban planning documents need to know they are receiving the authoritative file, not one of seven near-identical copies. The September working group meeting is the next hard date. What comes out of it will determine whether the city treats duplicate imagery as an administrative footnote or a structural problem worth fixing properly.