Berlin's public sector is sitting on a storage problem measured in terabytes and euros. Across municipal databases, from the Senatsverwaltung für Stadtentwicklung to the Stadtbibliothek Berlin's digital archive on Breite Straße, duplicate image files have quietly colonised hard drives and cloud servers for years — redundant photographs, scanned documents, and planning visuals stored two, three, sometimes four times over. An internal audit framework published by the Technologiestiftung Berlin in early 2025 flagged that duplicate digital assets account for an estimated 30 to 40 percent of storage overhead in mid-sized German public institutions. Berlin, with its sprawling network of Bezirksämter and cultural bodies, sits squarely in that bracket.
The timing matters because Berlin's administration is mid-way through its Digitalisierungsstrategie 2030, the coalition's flagship push to move planning, permitting, and cultural cataloguing online. That push is expensive — the Senate earmarked roughly 280 million euros for digital infrastructure across the 2024-2026 budget cycle. Every gigabyte wasted on redundant image files is a direct drag on that investment, and project managers across Mitte and Friedrichshain-Kreuzberg have begun raising the issue formally with the city's central IT body, the ITDZ Berlin.
The Scale of the Problem in Berlin's Own Systems
Put concrete numbers on it and the picture sharpens. The ITDZ Berlin manages data infrastructure for more than 80,000 public employees across the city's twelve districts. Industry benchmarks for enterprise environments — drawn from studies by the Fraunhofer-Institut für Offene Kommunikationssysteme, based at Kaiserin-Augusta-Allee in Charlottenburg — suggest that automated deduplication tools typically recover between 20 and 60 percent of used storage capacity in image-heavy databases. For an organisation operating at Berlin's scale, even the lower end of that range translates to hundreds of terabytes and six-figure annual savings in cloud hosting fees alone.
The Stadtmuseum Berlin, which manages photographic collections spanning the city's history from the Märkisches Museum in Mitte to the Ephraim-Palais, began a deduplication pilot in late 2024. The project, run in partnership with the Zuse-Institut Berlin on Takustraße in Dahlem, targeted approximately 1.2 million digitised image files. Early results indicated that around 18 percent of those files were exact or near-exact duplicates — a figure that surprised archivists who had assumed manual cataloguing had kept redundancy low. At roughly 4 megabytes per image, that represents close to 860 gigabytes of recoverable space from a single institution's collection.
The housing sector compounds the issue. Berlin's Wohnungsamt offices — handling rent cap documentation, building permits, and Milieuschutz applications across districts including Neukölln and Pankow — generate thousands of scanned property photographs each month. Under current workflow, the same facade image can enter the system from a field inspector, a legal clerk, and an automated upload from the applicant portal, with no deduplication check between any of them. A standardised hash-matching protocol, already in use by the Bundesarchiv in Koblenz since 2023, would catch those duplicates at the point of upload rather than years later during a storage audit.
What Comes Next for Berlin's Digital Housekeeping
The ITDZ Berlin is expected to publish updated technical standards for image asset management before the end of the third quarter of 2026. Those standards are likely to mandate perceptual hashing — a technique that identifies visually similar images even when file names or metadata differ — across all Senate-connected databases. The Zuse-Institut Berlin pilot is the most closely watched proof of concept for that rollout.
For Berliners dealing day-to-day with slow permit processing or patchy access to the city's online cultural archives, the practical upshot is straightforward: faster retrieval, lower error rates, and municipal IT budgets that stretch further. The Digitalisierungsstrategie 2030 was always going to be judged on delivery speed as much as ambition. Getting the data clean is the unglamorous prerequisite for everything else — and the numbers now make ignoring it politically uncomfortable.