Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Duplicate Image Problem: The Numbers That Are Costing the City's Digital Infrastructure

A data audit of Berlin's municipal and cultural image databases reveals a sprawling duplication crisis — and the bill is growing.

By Berlin News Desk · Published 4 July 2026, 8:40 pm

3 min read

Wird übersetzt…

More than 340,000 duplicate image files are sitting inside Berlin's municipal digital archive systems, according to internal assessments reviewed by The Daily Berlin — consuming server capacity, slowing public-facing platforms, and drawing fresh scrutiny from the Senate Chancellery's digital governance unit ahead of a July budget review cycle.

The duplication problem is not unique to Berlin, but the city's particular combination of factors makes it acutely expensive here. Since 2021, Berlin has invested heavily in digitalising public records across 12 district administrations, pushing content from paper and legacy servers onto centralised platforms. That migration — part of the broader Berliner E-Government-Gesetz implementation — moved fast. Quality control, by multiple accounts in the technical documentation, did not keep pace.

Where the Redundancy Lives

The bulk of the duplicated assets sit across two major systems. The Zentral- und Landesbibliothek Berlin, which manages digital heritage collections stretching back to material scanned before 2010, holds an estimated 80,000 image pairs flagged as probable duplicates under a 2025 deduplication pilot. Separately, the BIM Berliner Immobilienmanagement GmbH — the state-owned property manager responsible for around 5,000 public buildings across the city — runs a facilities image database used by contractors on projects from Tempelhof to Pankow. That system reportedly contains duplicate entries for roughly 30 percent of its catalogued assets, a figure cited in a Senate Department for Urban Development internal review from February 2026.

The cost is not abstract. Cloud storage pricing in Germany, running on infrastructure contracts typically benchmarked to AWS Frankfurt or Deutsche Telekom's Open Telekom Cloud, averages between €0.02 and €0.023 per gigabyte per month at public-sector contract rates. For a tranche of 340,000 uncompressed image files averaging 8 megabytes each, that represents approximately 2.7 terabytes of redundant data — adding up to a recurring annual overhead of several thousand euros just in raw storage, before factoring in indexing, bandwidth, and staff time spent managing misfiled assets.

The human cost is harder to quantify but widely acknowledged in technical circles. Archivists at the Stadtmuseum Berlin, whose Ephraim-Palais site in Mitte houses part of the city's historical photographic collection, have flagged in professional forums that duplicate image entries create cataloguing errors that then propagate into public search results. A researcher querying the online collection risks pulling the same image under three different accession numbers — a problem that erodes trust in the archive's reliability.

What Berlin Is Doing About It

The Senate Department for Digital Affairs and the Senate Chancellery are backing a structured deduplication programme running through the end of 2026. The programme uses perceptual hashing — a technique that identifies visually identical or near-identical images even when file names or metadata differ — across participating agencies. The Landesarchiv Berlin on Eichborndamm in Reinickendorf joined the pilot in March 2026, becoming the third major public institution in the programme after the ZLB and the urban development department's planning image library.

Perceptual hashing tools are not new, but their application to government archives at scale in Germany has lagged behind comparable programmes in cities like Amsterdam and Vienna, where municipal digitisation began earlier and deduplication protocols were baked into procurement requirements from the start. Berlin is essentially catching up — and paying for the gap.

Vendors pitching deduplication software to Berlin's procurement office quote implementation costs for a dataset of Berlin's scale at between €80,000 and €150,000 for initial cleanup, with ongoing annual licensing running between €15,000 and €40,000 depending on automation depth. The Senate is expected to publish tender documents before September 2026.

For district offices, the practical advice from the digital governance unit is already circulating: freeze new uploads to legacy systems where duplication rates exceed 20 percent, enforce mandatory metadata standards before any new scanning contracts are signed, and require contractors to run deduplication checks as a delivery condition. Whether those instructions filter down to the 96 district agencies still operating semi-autonomous image databases across Lichtenberg, Neukölln, Spandau and beyond is the question the July budget session will start to answer.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.