Berlin's public sector has a clutter problem hidden in plain sight. Across municipal departments, state libraries and urban planning databases, tens of thousands of duplicate digital images have accumulated over more than a decade of uncoordinated digitisation — redundant photographs, scanned documents and architectural renders that consume server capacity, slow retrieval systems and, in some cases, mislead planners working from outdated versions. The Senatsverwaltung für Stadtentwicklung, Bauen und Wohnen acknowledged the issue in internal workflow reviews conducted earlier this year, and now the question is who acts first and how.
The timing matters because Berlin is mid-way through a significant digitisation push tied to its Smart City Strategy, which runs to 2030. The city committed substantial IT infrastructure spending in its 2024-2026 budget cycle, and a second tranche of funding decisions is due before the Abgeordnetenhaus returns from summer recess in September. If duplicate-image cleanup is not written into procurement specifications now, the problem gets baked into new systems at greater expense later.
Where the Bottlenecks Are Forming
The Zentral- und Landesbibliothek Berlin on Breite Straße in Mitte holds one of the largest publicly accessible digital image repositories in the city, covering everything from 19th-century street photography to contemporary urban documentation. Librarians and archivists there have flagged that standard deduplication software performs poorly on historical image sets where two photographs of, say, the Gendarmenmarkt taken seconds apart are technically distinct files but functionally identical for catalogue purposes. The problem is not a technical failure — it is a classification failure, and fixing it requires human editorial decisions, not just an algorithm.
At the Technologiestiftung Berlin, which operates out of offices in Tempelhof-Schöneberg and advises the Senate on digital infrastructure, researchers have been mapping the scale of the issue across city departments since late 2025. Their preliminary finding — not yet published — is that the problem is concentrated in three areas: building permit scan archives held by the Bezirksämter, geospatial image sets maintained by the Senatsverwaltung für Umwelt, and the digitised press photography collections housed at the Berlinische Galerie in Kreuzberg. Each of those repositories uses different metadata standards, which is precisely why automated deduplication has produced unreliable results so far.
Berlin's BVG transport authority ran into a smaller-scale version of the same problem in 2023 when it digitised its internal infrastructure documentation. A six-month deduplication project, carried out with a Charlottenburg-based software contractor, reportedly reduced image file volumes by around 34 percent — but the process required three full-time archivists to manually validate flagged pairs before deletion. That ratio of human to machine effort is the benchmark other departments are now using to estimate costs.
The Decisions That Cannot Wait
Three choices are coming to a head before the end of the third quarter. First, the Senate must decide whether to procure a city-wide deduplication platform — estimated in budget working papers at between €2.1 million and €3.4 million depending on scope — or allow each Bezirk to handle its own archives independently, which most observers expect would produce inconsistent results and higher aggregate costs. Second, standards have to be agreed upon: the city's CIO office is currently mediating between at least four competing metadata frameworks used by different agencies. Third, the Abgeordnetenhaus budget committee needs to decide whether digital archive maintenance counts as infrastructure spending eligible for capital budget lines, or whether it gets classified as operational expenditure — a distinction that affects which multi-year funding mechanisms are available.
For residents, the practical stakes are real. Building permit records and planning maps are public documents under Berlin's transparency legislation. When duplicates introduce version confusion — two images of the same development site showing different states of construction — it creates genuine legal ambiguity for neighbours, lawyers and journalists trying to reconstruct approval histories for projects in fast-changing districts like Lichtenberg and Pankow.
The September budget window is the realistic last opportunity to get ahead of this before the next major digitisation contracts are awarded. Departments that do not resolve their duplicate-image frameworks now will almost certainly spend the next five years managing a problem that compounds with every new scan.