Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Duplicate Image Problem: The Key Decisions Ahead for the City's Digital Archive

A backlog of duplicate and misattributed photographs in Berlin's public records systems is forcing administrators, archivists, and tech companies to confront choices that will shape how the city documents itself for decades.

By Berlin News Desk · Published 4 July 2026, 8:35 pm

3 min read

Berlin's Duplicate Image Problem: The Key Decisions Ahead for the City's Digital Archive
Photo: Congressional Research Service / Public domain (Wikimedia Commons)
Wird übersetzt…

Berlin's municipal digital archive is sitting on a problem administrators have long known about but rarely discussed publicly: tens of thousands of duplicate and incorrectly tagged images scattered across government databases, slowing down everything from planning applications in Mitte to housing permit processing in Lichtenberg. The question now is who cleans it up, who pays, and what tools they use to do it.

The timing matters. The SPD-led Senate is pushing a broader digital modernisation agenda across Senatsverwaltungen, and the city's investment in its smart-city framework — anchored partly through the CityLAB Berlin facility on Tempelhofer Feld — has put data hygiene squarely on the agenda for 2026. Duplicate image records are not a glamorous subject, but they sit at the centre of a growing argument about whether Berlin's public infrastructure is actually ready to handle AI-assisted tools that several departments are already piloting.

The practical stakes are visible at ground level. At the Landesarchiv Berlin on Eichborndamm in Reinickendorf, archivists have been manually reconciling digital photograph records that overlap across multiple acquisition batches — a process that, according to internal project documentation circulated earlier this year, is consuming staff hours at a rate that cannot scale. Meanwhile, the Stadtentwicklungsamt in Friedrichshain-Kreuzberg has flagged that duplicate imagery in its planning portal has contributed to delays in cross-referencing site surveys, an issue the district's building administration has been trying to resolve since at least the third quarter of 2025.

The Technology Question No One Has Fully Answered

Several Berlin-based startups, including companies operating out of the Factory Berlin campus in Mitte, have pitched perceptual hashing and AI deduplication tools to city departments in the past 18 months. Perceptual hashing works by generating a fingerprint for each image based on visual content rather than file metadata, allowing near-duplicate photographs — taken seconds apart, or slightly cropped — to be identified and flagged automatically. The tools exist. The procurement and liability frameworks to deploy them inside Berlin's public sector do not yet exist in a form that satisfies the city's data protection obligations under the DSGVO, Germany's application of the EU's General Data Protection Regulation.

That is the central decision now looming over city IT planners. A working group under the Senatsverwaltung für Inneres und Digitalisierung is expected to publish guidance on AI-assisted archival tools before the end of the third quarter of 2026. If that guidance greenlights automated deduplication for photographic records, departments could begin procurement by autumn. If it imposes additional impact assessments — which DSGVO Article 35 requires for certain automated processing of personal data, including photographs of identifiable individuals — the timeline slips into 2027 at the earliest.

The cost difference between those two outcomes is significant. Manual deduplication projects of comparable scale in Hamburg's Staatsarchiv ran to roughly €400,000 over two years, according to a 2024 report from the Konferenz der Leiter der Archivverwaltungen des Bundes und der Länder. Automated approaches, even accounting for setup and staff retraining, have come in at around a third of that cost in pilot programmes elsewhere in Germany.

What Comes Next

The Senatsverwaltung für Inneres und Digitalisierung guidance document is the document to watch. Its framing will determine whether Berlin adopts a centralised deduplication infrastructure — a single platform serving all Bezirke — or leaves each district to commission its own solution, which risks recreating the fragmentation problem the exercise is meant to solve.

Advocates for centralisation point to the BVG's experience integrating its own operational image database, which serves everything from surveillance footage retention to press photography, as evidence that a unified system with clear governance outperforms a patchwork of local contracts. Critics argue that one-size solutions have consistently failed to account for the very different archival needs of, say, the Stadtmuseum Berlin near the Märkisches Ufer and a district planning office in Spandau.

For now, archivists are working the problem by hand. The guidance document lands first. Everything else follows from that.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.