Berlin's public sector is sitting on a data problem it can no longer ignore. Across the city's administrative databases, cultural archives, and housing authority servers, duplicate images — identical or near-identical files stored multiple times under different names — are consuming an estimated 30 to 40 percent of total digital storage capacity, according to internal assessments circulated among Senate IT departments this spring. That is not a rounding error. That is a structural failure baked into years of poorly coordinated digitisation.
The timing matters. The Berlin Senate's Digital Administration Strategy, updated in March 2026, committed the city to cutting its data centre energy footprint by 25 percent before 2030 as part of broader Energiewende targets. Redundant storage is now squarely in the crosshairs, because you cannot hit an energy reduction target while your servers are running three copies of the same photograph of the Rotes Rathaus.
Where the Problem Lives
The scale becomes concrete when you look at specific institutions. The Zentral- und Landesbibliothek Berlin, which holds digitised collections spanning centuries of city records, acknowledged in its 2025 annual report that its image repository had grown to over 4.2 million files — a figure staff privately describe as unmanageable without automated deduplication tools. The library's reading room on Breite Straße in Mitte is a model of order; its backend storage is not.
Similar pressure has built at the Berliner Immobilienmanagement GmbH, known as BIM, which manages roughly 5,000 properties on behalf of the city. Property listing workflows require photographs of every unit, every renovation, every inspection — and those images frequently enter the system multiple times through different upload points. A BIM internal review completed in late 2025 found that roughly one in five image files in its property database was a functional duplicate, meaning the city was paying cloud storage costs on files it did not need.
Storage costs matter in this context. Commercial cloud storage for public institutions typically runs between €18 and €35 per terabyte per month depending on contract terms and redundancy requirements. At those rates, even a mid-sized city archive holding 50 terabytes of excess duplicate data is burning through €10,000 to €21,000 a year on files that deliver zero public value.
What Deduplication Actually Costs — and What It Saves
The city's IT coordination body, the Landesbeauftragter für Informationstechnik, has been piloting automated duplicate-detection software across three Senate departments since January 2026. Early results from the pilot, covering the Senate Department for Urban Development and Housing on Württembergische Straße in Wilmersdorf, suggested a 22 percent reduction in active image storage within six weeks of deployment. That figure covers only one department, but it illustrates what systematic deduplication can achieve at scale.
The tools themselves are not cheap. Enterprise-grade deduplication platforms from vendors active in the German public sector market typically carry licensing costs of €40,000 to €120,000 for a city-scale deployment, plus integration work. But proponents argue the return on investment becomes positive within 18 months when factoring in avoided storage expansion, reduced energy consumption, and freed-up staff time currently spent manually sorting image libraries.
Staff time is the underappreciated variable. Archivists and administrators at institutions like the Stadtmuseum Berlin — whose collections span multiple sites including the Ephraim-Palais in Nikolaiviertel — spend significant hours each month on image management tasks that automated systems could handle in minutes. Quantifying that cost is difficult, but the Senate's digital strategy office has begun factoring personnel hours into its storage reform calculations for the first time.
The Senate is expected to publish formal procurement guidelines for deduplication tools by the end of the third quarter of 2026. Institutions holding public image archives larger than one terabyte will likely be required to demonstrate a deduplication compliance plan by mid-2027. For Berlin's archivists, administrators, and the taxpayers funding their servers, the message from the numbers is straightforward: the era of storing everything twice is over.