Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Digital Archives Are Full of Duplicate Images — and Officials Are Finally Talking About the Fix

From Mitte to Marzahn, city agencies and tech experts are weighing in on a growing data problem buried inside Berlin's public records systems.

By Berlin News Desk · Published 4 July 2026, 8:48 pm

3 min read

Berlin's Digital Archives Are Full of Duplicate Images — and Officials Are Finally Talking About the Fix
Photo: Photo by Irina Nesterenko on Pexels
Wird übersetzt…

Berlin's Senate Department for Urban Development confirmed this week that duplicate image files have accumulated across at least three of the city's core digital planning databases, creating redundancies that slow document retrieval and inflate storage costs. The problem, long flagged by archivists and database administrators working inside the system, has now moved up the agenda after an internal audit circulated among Senate departments in late June.

The timing matters. The city is midway through a multimillion-euro digitisation push under the Berliner E-Government-Gesetz, the state's electronic government law, which requires public records to be fully searchable and interoperable across agencies by the end of 2027. Duplicate and mislabelled image files directly undermine that goal, because automated indexing tools misread them, producing false search results and broken links in public-facing portals.

What the Experts Are Saying

Technologists and archivists working with Berlin institutions have been vocal about the scope of the issue. Researchers at the Zuse Institute Berlin — the applied mathematics and computing centre on Takustraße in Dahlem — have studied duplicate-detection algorithms in large public-sector datasets and have previously described deduplication as one of the most cost-effective interventions available to government IT departments before major system migrations. The institute has not commented publicly on the Senate audit specifically.

At Wikimedia Deutschland, based in Tempelhofer Ufer in Tempelhof-Schöneberg, staff archivists who manage Wikimedia Commons image repositories have dealt with duplicate file problems at scale for years. Their publicly documented workflows — using perceptual hashing to identify near-identical images even when file names differ — are increasingly cited in German municipal IT circles as a practical model for public-sector adoption.

The Berlin-based civic tech organisation Technologiestiftung Berlin, which advises city government on digital infrastructure, published a position paper in March 2026 arguing that unstructured image data in planning and housing departments represents one of the largest sources of avoidable administrative overhead in the city's IT budget. The paper did not name a specific cost figure for duplicate file management, but it called for standardised metadata schemas across all Senate departments before the 2027 compliance deadline.

The Numbers Behind the Clutter

Scale gives the debate its urgency. Berlin's urban planning portal, the Geoportal Berlin operated by the Senate Department for Urban Development and Housing, hosts more than 400 datasets, many of which contain georeferenced image layers updated on irregular cycles by different departments. When two departments photograph the same construction site on Frankfurter Allee or along the Spree riverfront and upload images under different file-naming conventions, the system has historically had no automated mechanism to flag the overlap.

According to figures published in the 2025 annual report of the Senate Department for Finance, Berlin spent approximately €47 million on IT infrastructure and digital services across Senate departments in the 2024 fiscal year. Officials have not broken out what share of that total covers storage directly, but IT administrators in comparable German city-states — Hamburg published a relevant breakdown in its 2024 digital strategy review — have estimated that redundant file storage typically accounts for between 8 and 15 percent of raw storage expenditure before deduplication programs are implemented.

The Senate's internal audit, according to officials briefed on its conclusions, recommends piloting an automated deduplication tool in the housing database first, targeting image records tied to the city's Wohnlagenkarte rent index mapping, before expanding to planning and transport datasets.

For Berliners and the startups increasingly building civic-data products on top of city open-data feeds — many of them clustered around the Factory Berlin campus in Mitte or the co-working spaces along Oberbaum City in Friedrichshain — cleaner image archives mean more reliable APIs and fewer broken data pipelines. The Senate is expected to present a formal deduplication roadmap to the relevant parliamentary committee before the summer recess ends in mid-August. Organisations wanting to submit technical feedback can do so through the Technologiestiftung Berlin's open consultation process, which closes on July 31.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.