Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Archives Race to Fix a Digital Mess: Duplicate Image Crisis Hits City Records This Week

A sprawling deduplication effort across Berlin's public digital infrastructure has forced archivists, city agencies and tech contractors to confront years of sloppy data management head-on.

By Berlin News Desk · Published 4 July 2026, 8:58 pm

3 min read

Berlin's Archives Race to Fix a Digital Mess: Duplicate Image Crisis Hits City Records This Week
Photo: Photo by Gökberk Keskinkılıç on Pexels
Wird übersetzt…

Berlin's Senate Department for Digital Transformation confirmed this week that a coordinated sweep of the city's centralised media repositories had uncovered tens of thousands of duplicate image files clogging government servers — a problem that has slowed public-facing portals, inflated storage costs and, in at least several documented cases, caused outdated photographs to resurface on official websites in place of current ones.

The audit, which began in late June under the city's ongoing Berliner Digitalisierungsprogramm framework, is the most extensive review of the municipal image database since the unified records system was expanded in 2021 to absorb assets from borough-level administrations across all twelve Bezirke.

Why This Week's Findings Matter

The timing is not accidental. Berlin's public-sector IT contracts are up for renegotiation before the end of Q3 2026, and the Senate's technology directorate has been under pressure from the SPD-led coalition to demonstrate measurable efficiency gains before new deals are signed. Storage redundancy is one of the cleaner metrics to show progress on: bloated repositories cost money, slow page-load times on citizen-facing portals like the Berlin.de service platform, and create legal exposure when outdated images — of public officials, redeveloped sites or demolished buildings — are displayed as current.

The Berlin.de portal alone serves roughly 1.2 million unique visitors per month, according to figures cited in the Senate's 2025 digital progress report. Even minor image-serving errors on that platform generate a measurable volume of complaint tickets to the city's IT helpdesk, the Zentraler IT-Dienstleister Berlin, known as ZIT Berlin, which manages backend infrastructure for dozens of public agencies.

Two specific sites have emerged as focal points this week. The Rotes Rathaus on Rathausstraße, whose communications team maintains a high-volume image library updated for press releases and event documentation, found more than 400 near-duplicate files during internal review — multiple versions of the same photographs saved under different filenames across different subdirectories. The Stadtbibliothek Berlin's digital collections unit in Mitte flagged a separate but related problem: replacement images uploaded during a 2023 digitisation drive had, in a number of instances, failed to overwrite their predecessors, leaving both versions live in the catalogue simultaneously.

How the Cleanup Is Being Handled

ZIT Berlin is deploying perceptual hashing software — a technique that identifies visually identical or near-identical images regardless of filename or metadata — across approximately 14 terabytes of stored assets. The process is expected to run through mid-July. Files flagged as duplicates are not immediately deleted; they are quarantined in a staging environment for human review, a step insisted upon by the Landesarchiv Berlin, which holds legal responsibility for ensuring no historically significant material is destroyed in automated purges.

The Landesarchiv, based on Eichborndamm in Reinickendorf, has assigned three additional staff members to the review process on a temporary basis. The Senate's digital team has set a target of reducing total image storage volume by at least 18 percent by September 1, 2026 — a figure that, if met, would translate to meaningful savings on cloud-hosting fees paid to the city's infrastructure partner.

For Berlin's growing community of civic-tech developers — many of them based around the Factory Berlin campus in Mitte and the co-working hubs along Schönhauser Allee in Prenzlauer Berg — the episode has reignited a longer-running debate about open data standards. Several developers working with the city's open data portal, daten.berlin.de, have pointed out that inconsistent metadata tagging at the point of upload is the root cause of most duplication; without a mandatory taxonomy, the same image gets saved repeatedly under different labels by different departments.

The Senate's digital office has indicated it will publish interim findings from the deduplication audit in a public report before the summer recess. Agencies managing their own image libraries have been advised to freeze new uploads to shared directories until the quarantine review is complete — a precaution that has already delayed several planned updates to the Berlin.de events calendar. Anyone relying on that calendar for July programming should check directly with individual venues for the most current information.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.