Berlin's public administration is sitting on a sprawling mess of duplicate digital images, and the scale of the problem has surprised even the officials tasked with fixing it. Internal data management reviews conducted across several Senate departments this spring found that redundant image files account for a significant share of total digital storage consumption—in some cases representing tens of thousands of identical or near-identical files stored across multiple servers simultaneously.
The issue matters now because the city is in the middle of a major push to modernise its digital infrastructure under the Berlin Digital Strategy, a programme the Senate Department for Digital and Administrative Reform has been rolling out since 2023. As agencies migrate legacy data to consolidated cloud environments, every duplicate file adds direct cost—both in cloud storage fees and in the staff hours required to manually sort, tag, and archive visual assets. With Berlin's IT modernisation budget under pressure from competing demands including BVG public transport upgrades and Energiewende-linked grid investments, administrators say eliminating redundancy is no longer optional housekeeping. It is a budget question.
The Numbers Behind the Clutter
The mechanics of duplication are straightforward but expensive. A single high-resolution photograph taken at, say, a press event at the Rotes Rathaus can end up stored in the communications department's shared drive, the city's public relations archive, a department-specific intranet folder, and a backup server—four copies of the same file, each consuming identical storage space. Multiply that by years of events, construction documentation, social media campaigns, and planning records, and the volume compounds fast.
Cloud storage pricing for enterprise-grade public-sector contracts in Germany typically runs between €0.02 and €0.05 per gigabyte per month, depending on redundancy tiers and provider agreements. A single uncompressed image from a professional camera can run to 25 megabytes or more. An archive of 100,000 such images—not an unrealistic figure for a major city administration—occupies roughly 2.5 terabytes. If four copies of that archive exist across different systems, the city is paying for ten terabytes of storage where two or three would suffice. Annual overspend in that scenario could reach five figures in euros before staff time is factored in.
The Bezirksamt Mitte, which manages a high volume of documentation for planning applications along corridors like the Müllerstraße and around Alexanderplatz, has been piloting deduplication software as part of a broader records digitisation effort that began in late 2024. The tool uses hash-based comparison—essentially a digital fingerprint for each file—to identify exact duplicates and flag near-matches for human review. Early results from the pilot have not been made public, but the programme is considered a test case for whether the approach can scale across all twelve Berlin districts.
What Deduplication Actually Takes
Cleaning up an image archive is not as simple as running a script. Near-duplicate images—photographs taken seconds apart, or the same graphic resized for different platforms—require human judgment. That means staff hours. The Landesarchiv Berlin, which maintains historical photographic collections including documentation from the postwar reconstruction of the Mitte district, has described deduplication as a years-long process when applied to analogue-to-digital conversion projects. The challenge for current digital assets is different in character but similar in scale.
Tools now available to public bodies range from open-source solutions to commercial platforms marketed specifically to German municipal governments. Several Bundesländer have procured centralised deduplication services through the Dataport IT service cooperative, though Berlin's participation in that framework remains partial. The city's own IT service provider, ITDZ Berlin, based in Müllerstraße in Wedding, is the likely delivery vehicle for any citywide rollout.
For departments still working through their backlogs, practical guidance from the Federal Office for Information Security recommends establishing a single authoritative image repository with clear naming conventions before any deduplication tool is deployed—otherwise the same files simply re-accumulate. Berlin agencies looking to start should contact ITDZ Berlin directly for access to the current framework agreements, which cover both audit software licensing and migration support. The next review cycle under the Digital Strategy is scheduled for the third quarter of 2026.