Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Digital Archive Drive Hits Snag Over Duplicate Image Problem — Here's Where Things Stand This Week

A city-wide effort to digitise and clean up Berlin's public image repositories has stalled on a surprisingly stubborn technical obstacle: tens of thousands of duplicate photographs clogging the system.

By Berlin News Desk · Published 4 July 2026, 8:45 pm

3 min read

Berlin's Digital Archive Drive Hits Snag Over Duplicate Image Problem — Here's Where Things Stand This Week
Photo: Photo by Nuray on Pexels
Wird übersetzt…

Berlin's Senate Department for Urban Development and Housing confirmed this week that its ongoing digitisation project — part of the broader Berliner Digitalisierungsoffensive launched in March 2025 — has encountered a significant bottleneck. Duplicate images, some appearing dozens of times across interconnected databases, are blocking the final migration phase of the city's centralised public asset library. The problem affects at least three major municipal databases, including those managed by the Stadtentwicklungsamt and the Landesdenkmalamt Berlin, the body responsible for cataloguing the capital's protected historical structures.

The timing matters. Berlin's SPD-led coalition has staked part of its administrative credibility on making city data more accessible and efficient — promises made repeatedly during debates over housing transparency and the Mietspiegel rent index. A public asset system that cannot reliably serve planners, journalists, and citizens without surfacing the same photograph of, say, a Plattenbau facade in Marzahn four times undermines those pledges in a very visible way.

How the Duplicates Accumulated

The root cause is not glamorous. Over roughly fifteen years, individual departments uploaded images independently, with no shared naming convention and no central deduplication layer. A single aerial photograph of Tempelhof Feld, for instance, might exist in the urban planning archive, the parks department's own folder structure, and a third legacy system inherited from a 2017 server consolidation. Multiply that across hundreds of landmarks and construction sites — from the Bergmannstraße corridor in Kreuzberg to the Spandauer Vorstadt — and the numbers compound fast.

The Senate's IT partner on the project, the Berliner Datenzentrum (BDZ), estimated internally earlier this year that roughly 12 percent of all image files flagged for migration were duplicates or near-duplicates. Across a library currently holding approximately 340,000 image assets, that translates to more than 40,000 files requiring manual review or automated matching before the new unified system can go live. The BDZ has not made that figure public, but it circulated in budget discussions at the Abgeordnetenhaus in May 2026.

Automated deduplication tools — the kind widely used by commercial platforms — struggle here because many of the images were scanned from different original prints at different resolutions, meaning pixel-by-pixel matching fails. A photograph of the Rotes Rathaus taken in 1998 and scanned at 72 dpi in 2009 will not match the same photograph scanned at 300 dpi in 2015, even though they are functionally identical records. Resolving that requires metadata cross-referencing and, in many cases, a human eye.

What Comes Next for the Project

The Senate's digital affairs office has brought in additional staff from the Zentraler IT-Dienstleister des Landes Berlin, known as ZIT-BB, to work through the backlog. A revised completion target of September 30, 2026 has been set internally, pushing the original June deadline by roughly a quarter. That delay has knock-on effects for the Stadtentwicklungsplan 2030 process, which relies on up-to-date visual documentation of neighbourhoods undergoing rezoning discussions, including parts of Lichtenberg and the Wedding-Gesundbrunnen corridor.

For residents and professionals who use the city's open data portal at daten.berlin.de, the practical advice for now is straightforward: if you are pulling images for planning submissions or journalism, cross-check file creation dates and department codes in the metadata tags. The Senate's guidance issued on June 27 recommends prioritising files tagged after January 2024, when a partial housekeeping exercise first cleared some of the oldest duplicates from the Landesdenkmalamt's holdings.

Berlin is not alone in facing this problem. Amsterdam's Stadsarchief dealt with a comparable duplicate-image crisis during its 2022 digitisation push and resolved it over eighteen months through a combination of perceptual hashing software and archival staff secondments. Berlin's administrators have been in contact with their Amsterdam counterparts for advice, according to documentation from a Senate committee session held at the Rotes Rathaus on June 18. The September target is tight. Whether the BDZ has the staffing capacity to hit it will become clearer when quarterly progress figures are released in late July.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.