Berlin's public administration is sitting on a sprawling, largely invisible problem. Across the city's network of digitised archives — from the Landesarchiv Berlin on Eichborndamm in Reinickendorf to the Senate Department for Urban Development's planning portal — duplicate image files now account for an estimated 30 to 40 percent of total stored data, according to internal IT review processes underway since late 2025. The numbers are striking: redundant files are burning through storage budgets that could otherwise fund the digitisation of tens of thousands of additional historical records.
The timing matters. Berlin's coalition government, led by the SPD, has staked a significant part of its administrative reform agenda on the Digitalisierungsoffensive, a multi-year programme to move city services online and open up data to the public. The Senate's digital budget for 2025–2026 allocated roughly €180 million toward infrastructure and platform development across all departments. If a third of that storage capacity is occupied by copies of copies, the financial and operational drag is substantial — and measurable.
Where the Duplication Accumulates
The problem is structural, not accidental. Berlin's district administrations — all twelve of them — operated for decades on separate, incompatible document management systems. When migration to the unified VOIS|MESO platform began rolling out after 2022, files were frequently imported multiple times during testing phases. The Bezirksamt Mitte and the Bezirksamt Friedrichshain-Kreuzberg both flagged this during transition audits, with technical staff identifying thousands of building permit scan files duplicated between two and six times each.
The Stadtbibliothek Berlin network, which spans over 80 branch locations including the flagship Amerika-Gedenkbibliothek on Blücherplatz in Tempelhof-Schöneberg, faces a parallel issue in its digitised newspaper and photograph collections. Scanning contractors working under sequential contracts between 2018 and 2023 used differing naming conventions, meaning automated deduplication software failed to catch images that were identical in content but labelled differently. One internal estimate, circulated at a library consortium meeting in March 2026, put the proportion of redundant image files in the historical photograph holdings at close to 28 percent.
Storage is not cheap at civic scale. Commercial cloud contracts used by Berlin's IT service provider ITDZ Berlin currently run at rates comparable to broader European public-sector benchmarks — roughly €0.02 to €0.05 per gigabyte per month for warm storage, with archival cold storage somewhat cheaper. When you're managing collections that run into hundreds of terabytes, even a 30 percent deduplication would free up resources equivalent to several full-time archival positions annually.
What Deduplication Actually Requires
Fixing the problem is neither quick nor cheap in the short term. Perceptual hashing — the technique that compares images by visual fingerprint rather than file name — requires significant processing time when applied retrospectively to large archives. A pilot run conducted by ITDZ Berlin on a subset of the Senate's planning map archive in early 2026 took six weeks to process roughly 2.2 million files and flagged approximately 610,000 as probable duplicates. Human review of flagged files is then required before any deletion, to avoid accidentally purging a scan that differs only marginally from another but is legally significant.
The European Commission's Interoperable Europe Act, which came into force in January 2025, now obliges member states to improve data quality standards across public administrations — including image metadata hygiene. Berlin's Senate Chancellery has acknowledged the act's implications for ongoing procurement, meaning new scanning and digitisation contracts must include deduplication clauses going forward.
For residents and businesses who interact with Berlin's planning or housing portals, the practical upside of a cleaned-up archive is faster search results and more reliable document retrieval. The Stadtentwicklungsplan Wohnen, the city's long-term housing development framework, relies on accurate, de-duplicated spatial data to model where new construction is viable. Errors introduced by redundant or mislabelled imagery slow down decisions that already move at a bureaucratic pace. Administrators across several Senate departments say a phased deduplication rollout, expected to begin in earnest by the first quarter of 2027, should cut redundant storage load by at least a fifth within eighteen months — freeing capacity and, eventually, budget for the next round of the city's digitisation ambitions.