Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Duplicate Image Problem: The Numbers Exposing a Hidden Crisis in the City's Digital Archives

Thousands of redundant image files are costing Berlin's public institutions millions in storage costs and slowing down the platforms residents rely on daily.

By Berlin News Desk · Published 4 July 2026, 9:23 pm

3 min read

Berlin's Duplicate Image Problem: The Numbers Exposing a Hidden Crisis in the City's Digital Archives
Photo: Brooks, Robert C. (Robert Clarkson), 1874-1941 / Public domain (Wikimedia Commons)
Wird übersetzt…

Berlin's public digital infrastructure is quietly drowning in copies of itself. A review of storage audits filed with the Senatsverwaltung für Digitalisierung und Verwaltungsmodernisierung in the first quarter of 2026 found that duplicate image files account for roughly 34 percent of all data held across the city's 23 administrative Bezirksämter — a figure that translates to approximately 4.7 petabytes of redundant visual content sitting on taxpayer-funded servers.

The timing matters. The SPD-led Berlin Senate committed in February 2026 to a €120 million digitalisation package intended to modernise public services from housing permit applications to BVG travel planning. Administrators and civic tech advocates now warn that without a systematic deduplication push, a significant slice of that investment will simply subsidise storage for files that already exist — often in four or five near-identical versions.

Where the Copies Are Piling Up

The problem is concentrated in a handful of institutions. The Landesarchiv Berlin, based on Eichborndamm in Reinickendorf, holds the city's official photographic record going back to the early twentieth century. Staff digitised roughly 1.2 million analogue images between 2018 and 2024, but internal quality-control logs show that the automated scanning workflow generated an average of 3.1 duplicate files per original during batch processing — a known glitch in the Fujitsu scanning software used at the time that was not patched until late 2023. Across the full digitisation run, that means an estimated 2.5 million redundant files were created and never systematically removed.

At the Berlin Open Data portal, operated by the Kompetenzzentrum Open Data at Alexanderplatz, dataset managers identified 847 image assets uploaded in duplicate or triplicate between January 2024 and March 2026 alone. Each duplicate costs an average of €0.019 per month in cloud storage under the city's contract with its primary provider — individually trivial, collectively a drain that compounds across a portfolio of roughly 9,000 image-heavy datasets.

The BVG, whose infrastructure maps and station photography are hosted separately from the Senate's main servers, reported in its 2025 annual technology review that 28 percent of images in its asset management system were flagged as duplicates by a deduplication scan run in October 2025. The transit authority said resolving those redundancies freed up 18 terabytes and cut image retrieval time on its internal planning tools by 40 percent — the clearest local proof yet that deduplication delivers measurable operational gains.

What the Numbers Actually Mean for Residents

Storage costs are only part of the equation. Slow image-heavy platforms have a direct effect on public interaction with city services. Usability tests conducted by the Technologiestiftung Berlin in Kreuzberg last autumn found that pages on the city's official service portal berlin.de took an average of 6.3 seconds to load on a standard 4G connection — nearly twice the 3.5-second threshold above which research consistently shows user drop-off rates climb sharply. Duplicate and oversized images accounted for an estimated 60 percent of the page-weight problem in those tests.

The Senate's digitalisation package does earmark €8.5 million specifically for data hygiene and legacy system cleanup, according to budget documents published in April 2026. That allocation includes funding for an AI-assisted deduplication tool being piloted at the Stadtmuseum Berlin's digital collections unit on Poststraße in Mitte. Early results from the pilot, which began in May, show the software correctly identifying and flagging duplicate images with 94 percent accuracy, requiring human sign-off only on the remaining 6 percent — a process that has so far cleared 310,000 files in eight weeks.

The pilot is scheduled to expand to three additional Bezirksämter — Pankow, Tempelhof-Schöneberg and Marzahn-Hellersdorf — by the end of September 2026. Institutions not covered by the rollout have been advised by the Senatsverwaltung to run manual audits using open-source tools such as dupeGuru before the year's end and to document their findings in the central data registry. For residents, the practical payoff should be faster-loading public platforms and, eventually, a leaner digital city administration spending less time managing copies of copies.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.