Berlin's public digital infrastructure is carrying a hidden weight. Across city-run servers, from the Senatsverwaltung für Stadtentwicklung to district-level Bürgerämter, tens of thousands of duplicate image files have accumulated over years of overlapping digitisation projects — and the bill is climbing. Officials and archivists are now pushing for a coordinated replacement and cleanup strategy before the problem compounds further.
The issue has sharpened in 2026 because Berlin's SPD-led coalition committed in its governing agreement to full digitisation of municipal records by the end of 2027. That deadline has put a spotlight on data quality. You cannot build a reliable digital infrastructure on top of redundant, mismatched image files — that is the core argument coming from technical specialists who work inside the city's IT governance structures. The Landesbeauftragter für den Datenschutz und die Informationsfreiheit, whose office sits on Friedrichstraße, has flagged redundant personal-image data as a compliance risk under federal data protection law.
Stadtarchiv Berlin and the Zentral- und Landesbibliothek on Breite Straße in Mitte are among the institutions that have been dealing with this concretely. Both hold large-scale digitised photo collections — historical street photography, planning documents, identity-adjacent records — and both have reported internal duplication rates that frustrate search and retrieval. The ZLB began an internal audit of its image holdings in early 2026 as part of a broader metadata standardisation effort. Staff at the archive have described the challenge not as a single fixable error but as the accumulated result of multiple scanning campaigns run by different contractors using incompatible file-naming conventions.
What the Specialists Are Saying
Technical experts working in Berlin's govtech space point to several overlapping causes. Digitisation projects funded under the EU's EFRE structural funds — which Berlin drew on heavily between 2018 and 2023 — often ran in parallel across different Bezirke without a shared taxonomy. Friedrichshain-Kreuzberg and Tempelhof-Schöneberg, for instance, each ran separate scanning initiatives for building permit archives. Without a unified deduplication protocol, the same document image could end up stored three or four times under different filenames across shared municipal drives.
The cost is not trivial. Cloud storage pricing for public-sector contracts in Germany typically runs between 0.02 and 0.05 euros per gigabyte per month under framework agreements. When duplicated image sets run into multiple terabytes — as internal estimates cited in Senatsverwaltung working papers suggest is the case — the monthly overspend adds up quickly. One working group examining the problem proposed that a phased duplicate-image replacement programme, using hash-matching algorithms to identify and retire redundant files, could cut image storage loads by 30 to 40 percent across participating departments within 18 months.
Berlin's CityLAB, the public innovation lab based at Platz der Luftbrücke in Tempelhof, has been consulted informally on tooling options. Open-source deduplication frameworks — including several developed by Wikimedia Deutschland, headquartered in Tempelhofer Ufer — are being evaluated for compatibility with existing municipal content management systems. The preference among technical advisers is for solutions that can be run on-premises to avoid routing sensitive scanned documents through third-party cloud processors.
What Comes Next
A formal proposal is expected to go before the Senat's digital affairs working committee before the end of the third quarter of 2026. If approved, a pilot programme covering four Bezirke would begin in early 2027, with full rollout timed to meet the digitisation deadline. District offices in Mitte and Pankow have already been named as likely pilot participants given the size and age of their scanned document holdings.
For residents and businesses that interact with city services — submitting planning applications, accessing historical property records, registering documents at a Bürgeramt — the practical payoff would be faster retrieval times and fewer instances of records appearing in duplicate or conflicting versions. The broader argument from officials is straightforward: clean data is foundational. Everything else the city wants to do digitally depends on getting this right first.