Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Digital Archives Are Drowning in Duplicate Images — And the Numbers Are Staggering

A quiet data crisis is costing Berlin's public institutions millions in storage, staff time and lost archival integrity.

By Berlin News Desk · Published 4 July 2026, 9:45 pm

3 min read

Berlin's Digital Archives Are Drowning in Duplicate Images — And the Numbers Are Staggering
Photo: Photo by Naro K on Pexels
Wird übersetzt…

Berlin's public sector holds an estimated 47 million digital image files across its municipal archives, cultural institutions and planning authorities — and a growing body of internal audits suggests that somewhere between 20 and 30 percent of those files are duplicates. That is not a rounding error. That is a structural problem with a price tag attached.

The issue has sharpened in 2026 because the Berlin Senate's digitalisation strategy, adopted in late 2024 under the SPD-led coalition, set a hard deadline of December 2026 for all Bezirksämter — district offices — to complete their migration to a unified document management platform called d.3ecm, operated through the IT service provider ITDZ Berlin. The migration is forcing administrators to confront image libraries that have accumulated unchecked since at least 2008, when most departments began mass-scanning paper records. Duplicates embedded then have been copied, re-uploaded and backed up across successive server generations ever since.

What the Numbers Actually Show

ITDZ Berlin, headquartered on Berliner Straße in Charlottenburg, manages cloud and on-premise storage for roughly 130,000 public sector workstations across the city. Internal benchmarking circulated within the Senate Chancellery and reviewed by The Daily Berlin indicates that duplicate image files account for approximately 2.4 petabytes of redundant storage — data the city is actively paying to maintain. At current enterprise storage pricing of roughly €18 per terabyte per month on managed infrastructure, that redundancy costs the city in the region of €43,000 every month, or just over half a million euros a year, before staff labour is counted.

The Stadtmuseum Berlin, which manages collections across venues including the Ephraim-Palais in Mitte and the Märkisches Museum near Köllnischer Park, commissioned its own internal review in early 2026 after a routine audit of its digitised collection found that 18,400 image files had at least one exact duplicate, and a further 6,200 had near-identical variants — same subject, marginally different scan settings — that were catalogued as separate objects. Correcting those records manually would require an estimated 1,900 staff-hours at current cataloguing rates.

The Zentral- und Landesbibliothek Berlin, with its main reading room on Blücherplatz in Kreuzberg, faces a comparable situation in its newspaper digitisation project. The ZLB has been digitising holdings under a federal Bundesdigitalisierungsprogramm grant running through 2027. Staff identified in March 2026 that automated scanning pipelines had generated duplicate TIF masters for approximately 12 percent of pages processed between 2021 and 2024 — roughly 340,000 individual image files that exist twice on the library's servers and are therefore counted twice in grant reporting metrics.

The Push Toward Automated Deduplication

The technical fix is not complicated. Perceptual hashing algorithms — software tools that generate a digital fingerprint for each image and flag matches — can process a library of 47 million files in under 72 hours on mid-range server hardware. Several Mittelstand software firms operating out of the Startup Campus Berlin at Tempelhof Field and the Factory Berlin co-working complex in Mitte have developed deduplication tools specifically calibrated for public-sector archival formats, including multi-layer TIFF and PDF/A.

The real barrier is governance. Berlin's twelve Bezirke retain significant autonomy over their own data holdings, and no single authority currently has a mandate to run deduplication sweeps across district boundaries. The d.3ecm migration provides a narrow window: files must be validated before they are transferred to the unified platform, creating a natural checkpoint at which duplicates can be identified and removed rather than simply migrated wholesale.

Administrators planning their own department's migration have been advised by ITDZ Berlin to run a preliminary deduplication audit at least eight weeks before their scheduled transfer date. For the six districts that have not yet begun migration as of July 2026 — including Reinickendorf and Treptow-Köpenick — that means the window is closing fast. December is four months away, and a petabyte of duplicates does not sort itself.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.