Berlin's public sector is sitting on a sprawling, bloated mess of digital image files. Internal audits circulated among city IT departments this spring found that duplicate images — identical or near-identical photographs stored multiple times across disconnected server systems — account for an estimated 30 to 40 percent of all image data held by municipal bodies. That figure, drawn from assessments covering the Senatsverwaltung für Stadtentwicklung and the Berliner Immobilienmanagement GmbH, translates into hundreds of terabytes of redundant storage running on servers spread across facilities in Mitte and Tempelhof.
Why does this matter now? The SPD-led Berlin coalition committed in its 2024 coalition agreement to accelerating the city's digital transformation under the Masterplan Digitalisierung, with a dedicated budget line pushing past €80 million over four years. That investment is supposed to modernise everything from planning applications to housing authority record-keeping. But administrators trying to migrate legacy data into unified cloud platforms are finding that duplicate image files are clogging pipelines, inflating migration costs, and creating legal headaches around data retention. Without a systematic deduplication strategy, the digitisation drive risks simply replicating old chaos in a newer, more expensive format.
The problem is particularly acute at the Berliner Stadtbibliothek network and within the Landesarchiv Berlin on Eichborndamm in Reinickendorf, which holds photographic records stretching back to the nineteenth century. Digitisation projects there have produced multiple scans of the same physical document — sometimes three or four versions created by different contractors at different resolutions — without any automated system in place to flag or consolidate them. A separate project run through the Technologiestiftung Berlin, based near Tempelhof, has been piloting open-source deduplication tools since early 2025, testing software against a sample set of roughly 200,000 archival images. Preliminary results from that pilot, shared at a Technologiestiftung working group in April 2026, suggested that automated tools could flag approximately 28 percent of the sample as duplicates or near-duplicates requiring human review.
What the Numbers Actually Show
Storage is not cheap, even at government procurement rates. Berlin's city IT framework contract, administered through the ITDZ Berlin — the state's central IT service provider — prices managed cloud object storage at roughly €0.02 per gigabyte per month. That sounds trivial until the scale becomes clear: estimates from the spring audit put redundant image data across surveyed departments at somewhere between 800 terabytes and 1.2 petabytes. At the lower bound, that is a recurring monthly bill of around €16,000 for data nobody needs to keep twice. Annually, that figure approaches €200,000 — and that covers only the departments included in the audit, not the full breadth of Berlin's more than 120 public authorities and subsidiary bodies.
Administrative cost compounds the storage bill. Deduplication is not a purely automated process. Images that are visually similar but not byte-for-byte identical — think two scans of the same 1960s Kreuzberg street photograph taken at slightly different exposures — require a human archivist to decide which version to keep. Archivists at the Landesarchiv earn between €3,200 and €4,100 per month under TVöD public sector pay scales, and experienced staff capable of making those judgements are already in short supply. The Technologiestiftung pilot estimated that a full deduplication pass across the Landesarchiv's digital holdings alone would require roughly 14 months of dedicated staff time if done to professional archival standards.
What Comes Next for Affected Institutions
The Technologiestiftung Berlin is expected to publish a full methodology report from its deduplication pilot in September 2026, which will include recommended open-source tooling and a proposed workflow for Berlin's public archives. The ITDZ Berlin has been asked by the Senatskanzlei to include deduplication auditing as a standard requirement in any new data migration contract signed after January 2027.
For institutions managing their own image collections in the meantime, the practical advice emerging from the April working group is direct: freeze new digitisation contracts until existing holdings have been catalogued with unique checksums, establish a single metadata standard across departments, and pilot any new scanning work through the Technologiestiftung framework before committing to full-scale rollout. The cost of doing it right now is measurably lower than the cost of doing it over again later.