Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Digital Archives Are Drowning in Duplicate Images — And the Numbers Tell a Damning Story

A growing crisis of redundant visual data is costing Berlin's public institutions millions of euros and thousands of staff hours, with no coordinated fix yet in sight.

By Berlin News Desk · Published 4 July 2026, 9:16 pm

3 min read

Berlin's Digital Archives Are Drowning in Duplicate Images — And the Numbers Tell a Damning Story
Photo: Photo by Julia Wolff on Pexels
Wird übersetzt…

Berlin's public sector holds an estimated 47 million digital image files across its administrative departments, cultural institutions, and urban planning offices — and by the most conservative internal assessments circulating among IT administrators, somewhere between 18 and 22 percent of those files are duplicates. That means roughly 9 million redundant images consuming server space, degrading search reliability, and draining budgets that were already stretched by the housing crisis and BVG infrastructure expansion.

The problem has accelerated sharply since 2022, when the Senate Chancellery pushed dozens of departments onto a unified cloud-hosting framework following the partial failure of Berlin's on-premise data infrastructure. Migration projects routinely copied entire image libraries without deduplication protocols, layering old redundancies onto new ones. The result is a sprawl that IT teams at organisations including the Stadtentwicklungsamt and the Zentral- und Landesbibliothek Berlin on Breite Straße are now being asked to untangle with tools and staffing levels that predate the cloud transition.

What Duplicate Images Actually Cost

Storage is cheap, the argument goes — until it isn't. Berlin's Senate Department for Finance allocated roughly €14.2 million to public-sector cloud infrastructure in its 2025 budget cycle. Administrators working within that framework have flagged that redundant image data accounts for a disproportionate share of retrieval and indexing overhead costs, though precise figures remain contested internally. What is less contested: staff time lost to duplicate-image confusion — wrong versions published, outdated maps used in planning consultations, press releases illustrated with deprecated graphics — represents a real operational liability.

The Stadtarchiv's digital wing, which maintains visual records dating back to the postwar reconstruction of Mitte and Prenzlauer Berg, ran an internal audit in late 2024 that identified over 340,000 image pairs flagged as probable duplicates within a single decade-long acquisition batch. Resolving them manually, at a conservative estimate of 90 seconds per pair, would require more than 8,500 staff hours. That is roughly four full-time employees working for an entire year on nothing else.

Automated deduplication software exists, and several Berlin-based startups in the Adlershof technology park have been developing or reselling such tools since at least 2021. The challenge is integration. Berlin's institutions run on a patchwork of content management systems — some dating to the mid-2000s — that do not share metadata standards. An image catalogued under one convention at the Humboldt Forum on Museum Island cannot be automatically matched against its twin catalogued under a different convention at a district planning office in Tempelhof-Schöneberg.

Where the Pressure Is Building

The urgency is sharpening for two specific reasons. First, Berlin's proposed Open Data Masterplan, currently under Senate review, would require public institutions to publish image assets in accessible repositories by 2027. Publishing millions of duplicates would undermine the initiative's credibility from day one and impose indexing costs on whoever maintains the public-facing portal. Second, the BVG's ongoing infrastructure documentation project — photographically recording station renovation work at over 60 U-Bahn stations including Hermannplatz and Alexanderplatz — is generating roughly 12,000 new images per month. Without a deduplication pipeline baked into the workflow from the start, the archive will replicate the same structural problem within three years.

For institutions still working through this manually, the practical calculus is brutal. The Zentral- und Landesbibliothek has reportedly been piloting a perceptual hashing approach — a technique that identifies visually similar images even when file names and metadata differ — with mixed results on historical photograph collections where image quality varies sharply across decades. Perceptual hashing tools are widely available open-source, but tuning them to tolerate legitimate variation without flagging genuine duplicates requires specialist input that most public cultural institutions in Berlin do not currently have on staff.

The path forward almost certainly runs through procurement rather than internal development. Berlin's Senate IT directorate has a framework agreement mechanism — the Rahmenvertrag system — that can fast-track software licensing across multiple departments simultaneously. Whether that mechanism gets applied to deduplication tooling before the 2027 Open Data deadline will determine whether Berlin publishes a coherent, searchable image commons or an expensive digital landfill.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.