Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Duplicate Image Problem: The Numbers Piling Up Inside City Hall's Digital Archives

Thousands of redundant files are clogging Berlin's public sector servers — and the bill for storing and managing them keeps climbing.

By Berlin News Desk · Published 4 July 2026, 9:16 pm

3 min read

Berlin's Duplicate Image Problem: The Numbers Piling Up Inside City Hall's Digital Archives
Photo: Photo by Paula Schmidt on Pexels
Wird übersetzt…

Berlin's public administration is sitting on a data storage problem it can measure but has struggled to fix. Across the Senate's network of ministries, district offices and publicly funded agencies, internal audits have repeatedly flagged the same issue: duplicate image files — scanned documents, promotional photos, architectural renders, planning maps — stored multiple times across disconnected server environments, inflating both storage costs and the staff hours needed to manage them.

The timing matters because the Senate Department for Digital and Administrative Modernisation, based on Berliner Allee, has been pushing its consolidation agenda under the city's Digitalisierungsstrategie 2025 framework. That programme, originally scoped to run through the end of 2025 and now extended into 2026, set a target of migrating district-level data silos into a unified cloud infrastructure. Duplicate file remediation was listed as a baseline requirement before migration could begin. Progress has been uneven.

What the Numbers Actually Show

Storage audits carried out as part of Berlin's IT consolidation process — shared internally across the Senatsverwaltung für Inneres and IT-Dienstleistungszentrum Berlin (ITDZ) — identified duplicate image files accounting for an estimated 18 to 22 percent of total storage volume on legacy government servers. That is not a trivial figure. ITDZ, headquartered in Alt-Moabit, manages roughly 4,000 servers and storage systems for Berlin's public sector. If even the lower end of that duplication estimate holds, tens of terabytes of billable storage are being paid for twice or more.

Cloud storage pricing benchmarked against comparable European municipal contracts runs between €0.018 and €0.025 per gigabyte per month at scale. At those rates, a 10-terabyte block of pure duplicate content costs the city between €180 and €250 every month — before factoring in bandwidth, backup cycles and human administration time. Multiply that across twelve Berlin districts, each running partially independent IT environments inherited from pre-2001 reunification-era structures, and the cumulative waste becomes significant on an annual budget line.

The Reinickendorf and Lichtenberg district offices have been cited in internal ITDZ planning documents as among the sites with the oldest image file repositories, some predating standardised metadata tagging that would make automated deduplication straightforward. Files scanned before 2010 frequently lack EXIF data or consistent naming conventions, meaning automated tools produce false negatives and flag unique documents as duplicates — or miss actual duplicates entirely.

Deduplication Tools and the Path Forward

The ITDZ began piloting hash-based deduplication software across three departments in the first quarter of 2026. The approach — assigning each file a unique cryptographic fingerprint and comparing fingerprints rather than file names — is standard practice in enterprise IT and has been deployed by Hamburg's Dataport agency since at least 2022. Berlin's rollout has moved more slowly, partly because legal teams within the Senatskanzlei have required sign-off on data retention obligations before any files are flagged for deletion, even duplicates of public-domain photographs.

The practical advice for anyone working inside the city's digital infrastructure is straightforward: document retention schedules under the Berliner Aktenordnung prescribe minimum holding periods for administrative records, but photographic duplicates sitting in shared drives on Alexanderplatz-district servers do not automatically qualify as archival records. Getting legal clearance early, before the deduplication tool runs, cuts remediation time sharply.

For Berlin's tech and startup sector — which has lobbied the Senate through bodies like the Technologiestiftung Berlin for faster open-data releases — the duplicate file backlog has a secondary effect: bloated, inconsistent datasets slow the publication cycle for public geodata and planning documents on the Daten.berlin.de portal. Cleaner archives mean faster releases. The Senate's digital modernisation office has indicated the ITDZ pilot results, due to be assessed in September 2026, will determine whether a city-wide deduplication mandate is written into the next procurement round. That round is expected to open before the end of the year.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.