Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Digital Archives Are Full of Duplicate Images — and Officials, Experts Say the Fix Is Overdue

From Mitte to Marzahn, public institutions are grappling with bloated image databases, and the people who manage them are finally talking about it.

By Berlin News Desk · Published 4 July 2026, 8:47 pm

3 min read

Berlin's Digital Archives Are Full of Duplicate Images — and Officials, Experts Say the Fix Is Overdue
Photo: Photo by Max Kladitin on Pexels
Wird übersetzt…

Berlin's public sector is sitting on hundreds of thousands of duplicate digital images stored across fragmented servers — wasting storage capacity, distorting search results, and costing taxpayers money that administrators say could be redirected to frontline services. City data managers and archivists are now pushing hard for a coordinated response, and the debate is moving from back-office frustration to formal policy discussion at the Senatsverwaltung für Digitalisierung.

The problem has sharpened focus in 2026 because of Berlin's push to consolidate its digital infrastructure under the Berliner E-Government-Gesetz framework, which requires public agencies to meet updated data quality standards by the end of the year. Duplicate images — the same photograph stored under different file names across multiple departments — represent one of the messiest, least glamorous corners of that cleanup. Archivists at the Landesarchiv Berlin on Eichborndamm in Reinickendorf describe the challenge as one that crept up over two decades of siloed IT procurement.

What Experts and Officials Are Saying

Professionals working at the intersection of public administration and digital asset management broadly agree on the diagnosis. Specialists affiliated with the Zuse Institute Berlin on Takustraße in Dahlem, which handles large-scale scientific data infrastructure, have pointed to perceptual hashing and machine-learning-based deduplication as the most practical tools available at scale. These methods compare image fingerprints rather than file names, catching copies that have been re-exported, resized, or slightly colour-corrected — the kind that simple duplicate-file finders miss entirely.

The Berliner Beauftragter für Datenschutz und Informationsfreiheit has also weighed in on the governance side. Any automated image-matching system deployed across public databases must be assessed for privacy compliance, particularly when photographs include identifiable individuals — a concern that becomes acute when the images in question come from social services departments, youth welfare files, or refugee registration records held by the Lageso, Berlin's state office for health and social affairs.

Berlin's startup ecosystem has not stayed quiet. Several companies based in Kreuzberg's Wrangelkiez and in the Euref-Campus in Schöneberg have developed commercial deduplication tools already marketed to media houses and e-commerce platforms. Representatives from at least two of those firms have presented to city procurement officials, though no contract has been announced. The pitch is straightforward: algorithms that already handle millions of product images for retail clients can be adapted for public-sector archives at a fraction of bespoke development costs.

The Numbers Behind the Clutter

Scale matters here. The Senat's 2025 digital infrastructure audit — published in March of that year — identified digital asset management as one of three categories with the highest rates of redundant data across Berlin's 12 borough administrations. While the audit did not publish a single headline figure for image duplication specifically, it estimated that unstructured data redundancy across all formats was costing the city between €4 million and €7 million annually in excess storage and administrative overhead. Images represent a significant share of unstructured data volume in most public administrations.

The Bezirksamt Friedrichshain-Kreuzberg launched a pilot deduplication project for its internal media library in January 2026, using open-source tooling. Early results, shared at a February workshop hosted by the CityLAB Berlin on Platz der Luftbrücke in Tempelhof, suggested a reduction of roughly 30 percent in stored image files within three months — though officials cautioned that figures from a single borough cannot be extrapolated cleanly to the whole city.

What happens next depends heavily on whether the Senatsverwaltung für Digitalisierung moves from pilot endorsement to mandatory standard. Officials have indicated a framework decision is expected before the end of the third quarter. Agencies that have not begun their own assessments are being told, in increasingly direct terms, that waiting is no longer a defensible position. For institutions from the Staatsbibliothek zu Berlin on Potsdamer Straße to the smaller borough media offices in Spandau and Lichtenberg, the practical advice from digital governance specialists is consistent: start with an audit of existing assets before any tool is procured, because the duplication problem cannot be solved by software alone if the filing habits that created it remain unchanged.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.