Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell the Story

A growing crisis in the city's public records infrastructure reveals how redundant image files are consuming storage, inflating IT budgets, and slowing the digitisation projects Berliners were promised.

By Berlin News Desk · Published 4 July 2026, 9:27 pm

3 min read

Berlin's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell the Story
Photo: Photo by Manish Jain on Pexels
Wird übersetzt…

Berlin's public administration is sitting on millions of duplicate image files — redundant scans, double-uploaded photos, and repeated attachments — that are quietly eating through storage budgets and slowing the digital transformation the SPD-led Senate has staked much of its governance credibility on. New internal assessments circulating among IT departments at the Rotes Rathaus and across district-level Bürgerämter put the share of duplicated image data in certain legacy systems at somewhere between 30 and 40 percent of total stored files, according to public-sector IT professionals familiar with the audits. That is not an edge-case problem. That is a structural one.

The timing matters. Berlin is midway through its eGovernment strategy, a multi-year digitisation roadmap that has already drawn criticism for delays and cost overruns. The city's IT service provider, ITDZ Berlin — the Informationstechnikzentrum Berlin, based on Berliner Straße in Tempelhof — manages the backend infrastructure for dozens of agencies. Duplicate image bloat compounds every migration, every system upgrade, and every backup cycle. Storage costs in enterprise cloud environments have dropped significantly over the past decade, but they have not dropped to zero: hyperscale cloud storage for public-sector contracts in the EU typically runs between €0.02 and €0.04 per gigabyte per month depending on redundancy tier and compliance requirements. Multiply that across petabyte-scale archives and the bill climbs fast.

Where the Duplicates Come From

The problem has a traceable origin. Berlin's Bürgerämter — including heavily used offices in Mitte, Friedrichshain-Kreuzberg, and Neukölln — shifted rapidly to document scanning during and after the pandemic years, often without unified file-naming protocols or deduplication steps built into their upload workflows. Staff at multiple offices scanned the same identity documents repeatedly across appointment sessions, generating fresh image files with different timestamps but identical content. Across the city's 23 official immigration and registration processing systems alone, IT assessors have flagged this as a primary driver of storage inflation.

The Landesarchiv Berlin, which holds the city's historical records and has offices on Eichborndamm in Reinickendorf, faces a parallel version of the same headache in its digitisation work. When physical documents are scanned in bulk — particularly for the ongoing project to digitise pre-1990 East Berlin municipal records — quality-check rescans are routine. Without automated duplicate detection running in parallel, the archive can end up storing three or four image versions of the same page. Across a project digitising hundreds of thousands of pages, that multiplies storage demand dramatically before a single archivist flags the problem manually.

What Deduplication Actually Costs — and Saves

Deduplication software is not new. Perceptual hashing tools, which identify visually identical or near-identical images even when file metadata differs, have been commercially available since the early 2010s and are embedded in enterprise content management platforms used across German federal agencies. The Bundesarchiv in Koblenz implemented automated deduplication as part of its TRIARCH document management rollout. Berlin has lagged on deploying comparable tools at scale within its own systems.

The financial case is straightforward. If the city's total managed image archive sits at, conservatively, 500 terabytes across ITDZ-managed systems — a figure consistent with public procurement documents for storage infrastructure from 2023 and 2024 — and 35 percent of that is duplicated data, eliminating those duplicates frees roughly 175 terabytes. At average EU public-sector cloud rates, that represents a recurring saving of several thousand euros per month in direct storage costs, before counting the reduced backup windows, faster search indexing, and lower data egress fees. Over a five-year contract cycle, the savings compound into figures that would comfortably fund additional Bürgeramt digitisation staff.

The Senate's Department for Digital Transformation has listed storage optimisation as a priority item in its 2025-2028 planning cycle. Practically, that means the next procurement round for ITDZ Berlin is expected to include deduplication tooling as a mandatory requirement rather than an optional module. For residents, the visible effect should be faster document processing at Bürgerämter — particularly in high-demand districts like Neukölln and Mitte, where appointment backlogs have stretched to six weeks or more. For the city's IT budget, getting a handle on the numbers is the first step toward getting a handle on the bill.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.