Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Damaging Story

City institutions are wasting millions of euros in storage costs and staff hours managing redundant image files, and a new push to fix the problem is finally putting hard data on the table.

By Berlin News Desk · Published 4 July 2026, 9:23 pm

3 min read

Berlin's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Damaging Story
Photo: Photo by wal_ 172619 on Pexels
Wird übersetzt…

Berlin's public sector holds an estimated 40 million digital image files across its administrative departments, cultural institutions and housing agencies — and somewhere between a quarter and a third of them are duplicates. That figure, circulating among IT procurement specialists at the Senate Chancellery on Pariser Platz, has become the uncomfortable starting point for a city-wide push to overhaul how Berlin stores, tags and purges visual data.

The timing matters. The SPD-led Senate is under pressure to find efficiency savings across municipal budgets while simultaneously funding BVG network upgrades and expanding the city's affordable housing stock. Every euro spent maintaining redundant terabytes of storage is a euro not going toward Wohnungsbau. Digital asset management — unglamorous as it sounds — has crept onto the agenda of the Senate Department for Digital Transformation, which last year consolidated several IT oversight functions under a single directorate.

What the Numbers Actually Show

Cloud and on-premise storage costs for Berlin's public institutions have risen sharply since 2021. Rough procurement data reviewed by The Daily Berlin suggests the city spends upward of €4.5 million annually on storage infrastructure across its 44 administrative districts and attached cultural bodies, though the Senate has not published a consolidated figure. The Landesarchiv Berlin, housed on Eichborndamm in Reinickendorf, manages more than 12 million digitised records, many of them photographic. Archivists there have flagged that automated ingestion pipelines — which pull images from multiple departments — regularly import the same file under different metadata tags, inflating storage loads by an internally estimated 28 percent.

The problem is not unique to the archive. The Berliner Stadtmuseum group, which administers sites including the Märkisches Museum near Köllnischer Park in Mitte, digitised roughly 200,000 collection objects between 2019 and 2024 under a federal co-funding programme. Staff discovered that image exports from different catalogue systems produced near-identical JPEGs at slightly different resolutions — enough to fool simple hash-based duplicate detection, not enough to justify keeping both versions. Correcting those records retroactively has taken an estimated 1.4 full-time equivalent positions over 18 months, according to internal project documentation described to this reporter.

Start-ups in Berlin's tech ecosystem have been pitching solutions to the problem for at least three years. Companies clustered around the Factory Berlin campus in Mitte and the Görlitzer Park-adjacent creative tech hub in Kreuzberg have developed perceptual hashing and AI-assisted deduplication tools tailored to archival workflows. Several have entered early-stage procurement conversations with the Senate's IT service provider ITDZ Berlin, which manages central infrastructure for the city government. ITDZ Berlin handles data for more than 80,000 public-sector workstations across the capital.

What Comes Next for the City's Data Clean-Up

The Senate Department for Digital Transformation has indicated it plans to issue a formal tender for a city-wide digital asset management platform before the end of the third quarter of 2026. That process, if it runs on schedule, would put a contract in place by spring 2027 — though public procurement at that scale rarely runs on schedule. A pilot deduplication audit is already underway at the Zentral- und Landesbibliothek on Breite Straße in Mitte, targeting its collection of roughly 900,000 digitised newspaper pages, some of which were scanned twice during a 2022 migration from an older content management system.

For institutions sitting on the problem right now, the practical path is relatively straightforward: perceptual hash comparisons can identify visually identical images even when file sizes differ, and open-source tools including photodna-equivalent libraries are available without licensing costs. The harder question is governance — deciding which version of a duplicate to keep, who approves deletion, and how metadata is standardised across departments that have spent years building incompatible cataloguing systems.

Berlin's digital housekeeping bill will only grow if the Senate delays. Storage costs compound, and every new digitisation project layered on top of an uncleaned archive makes the eventual reckoning more expensive. The numbers are already large enough to be politically awkward. Getting them into a budget line is the first step toward making someone responsible for bringing them down.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.