Berlin's public sector has a clutter problem hidden in plain sight. Across municipal servers maintained by the Senatsverwaltung für Stadtentwicklung and the Zentral- und Landesbibliothek Berlin, duplicate image files account for an estimated 30 to 40 percent of total stored digital assets — a figure that storage consultants across the industry cite as typical for large public institutions that digitised collections rapidly without quality controls. The redundancy is not trivial: unnecessary storage drives up licensing costs, slows retrieval systems, and increasingly complicates the city's open-data commitments under the Berlin Open Data Ordinance, which came into full effect in January 2024.
Why does this matter right now? The SPD-led Berlin Senate is pushing a digitisation acceleration programme through 2027, channelling funding into converting physical records — planning documents, historical photographs, infrastructure blueprints — into searchable digital formats. As more material flows into repositories, the duplicate problem compounds. Institutions that failed to implement deduplication protocols early are now discovering that retrofitting those systems onto bloated archives costs significantly more than doing it from the start. Storage capacity is finite and, in Berlin's tight municipal budget environment following the 2024 austerity corrections, not infinitely expandable.
Where the Data Piles Up
Two institutions illustrate the scale particularly well. The Stadtmuseum Berlin, whose main collection spans sites including the Märkisches Museum near Köllnischer Park in Mitte, began a structured deduplication audit in March 2025 covering roughly 1.2 million digitised photographic records. Early internal assessments, described in publicly available procurement notices, suggested that between 15 and 22 percent of image files were near-identical duplicates created during multiple scanning passes of the same physical object. At standard cloud storage rates of around €0.02 per gigabyte per month — the tier used by many Berlin public bodies contracting through the Dataport consortium — even a few hundred terabytes of redundant image data translates to tens of thousands of euros in annual wasted expenditure.
The Technologiestiftung Berlin, based in Tempelhof-Schöneberg, flagged the issue in its 2025 Digital Infrastructure Report, noting that duplicate asset management was one of the top three inefficiencies reported by Berlin district administrations in a survey of 24 Bezirksämter. The report stopped short of providing a city-wide cost figure, but comparable audits in Hamburg and Vienna — both cities with similarly scaled municipal digitisation programmes — have put avoidable storage waste at between €500,000 and €1.2 million annually for administrations of Berlin's size.
Algorithms, Audits, and What Gets Fixed
The technical fix is well-understood. Perceptual hashing algorithms — software tools that generate a fingerprint for each image and flag near-matches — can process a million files in under 24 hours on mid-range server hardware. Several Berlin-based startups operating out of Factory Berlin on Rheinsberger Straße in Mitte and the EUREF Campus in Schöneberg have developed specialised tools aimed at exactly this public-sector market. Licensing costs for such software typically run between €8,000 and €25,000 per year for an institutional deployment, a fraction of the ongoing storage waste they are designed to eliminate.
The harder problem is governance, not technology. Institutions need clear rules about which version of a duplicated image becomes the canonical record, how metadata is merged, and who signs off on deletion — questions that intersect with archival law under the Berlin Archivgesetz. Without those protocols in place, even the best deduplication software produces results that archivists are reluctant to act on.
For institutions still in the early stages of digitisation, the practical advice from procurement records and technology assessments is consistent: embed deduplication checks at the point of ingest, not after the fact. The Senatsverwaltung für Digitalisierung is expected to publish updated technical standards for municipal image repositories before the end of the third quarter of 2026. Institutions that wait for that guidance before building their workflows will avoid the expensive retrofit problem that has already caught larger collections short. The numbers, at least, are clear enough to act on now.