Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Digital Archives Are Drowning in Duplicate Images — And the Numbers Tell a Costly Story

A surge in redundant visual data is straining city databases, municipal platforms and local media archives, with storage costs climbing and retrieval times slowing across Berlin's public and private sectors.

By Berlin News Desk · Published 4 July 2026, 8:40 pm

3 min read

Berlin's Digital Archives Are Drowning in Duplicate Images — And the Numbers Tell a Costly Story
Photo: Photo by Zois Fotis on Pexels
Wird übersetzt…

Berlin's public institutions are sitting on millions of duplicate images — identical or near-identical files stored multiple times across servers, cloud platforms and legacy hard drives — and the administrative bill is growing. A review of municipal digitisation projects across the city found that duplicate image files can account for between 25 and 40 percent of total storage volume in large-scale public archives, according to figures published by the Fraunhofer Institute for Open Communication Systems, which operates its FOKUS research division out of Kaiserin-Augusta-Allee in Charlottenburg.

The issue has landed with particular force in 2026, as Berlin's Senate Department for Digital Transformation — the Senatsverwaltung für Digitalisierung — accelerates its push to centralise municipal data under the Berlin Open Data Strategy. The strategy, which entered its current implementation phase in January 2025, requires agencies from Mitte to Treptow-Köpenick to migrate decades of analogue and semi-digital records onto shared infrastructure. That migration is producing a familiar problem: no one checked what was already there before uploading what already existed somewhere else.

The Scale of the Problem in Figures

Storage is not cheap, even at institutional scale. Enterprise-grade cloud storage for German public bodies typically runs between €0.02 and €0.05 per gigabyte per month under current procurement frameworks. For a mid-sized Berlin district office holding, say, 80 terabytes of archival image data — a figure consistent with the volume described in a 2024 Digitalisierungsrat Berlin progress report — redundant files alone could be generating unnecessary monthly costs in the range of several thousand euros. Multiply that across twelve Berlin districts and a dozen major public cultural institutions and the cumulative waste becomes a budget-line problem, not a technical footnote.

The Stadtmuseum Berlin, which manages collections across sites including the Märkisches Museum on Am Köllnischen Park, acknowledged in its 2024 annual review that its digitisation push had identified significant file duplication within photographic holdings. The Zentral- und Landesbibliothek Berlin, headquartered on Breite Straße in Mitte, has been running deduplication checks as part of its DigiPortA digitisation programme since 2023. Early results from comparable German library digitisation efforts suggest deduplication can recover between 15 and 30 percent of allocated storage capacity.

The problem is not purely governmental. Berlin's startup ecosystem — clustered around Factory Berlin in Mitte and the tech corridors of Prenzlauer Berg — has generated its own wave of image-heavy platforms in e-commerce, property tech and media. Companies using multiple content management systems across development and production environments routinely replicate product imagery and marketing assets without automated deduplication protocols in place. A 2023 survey by Bitkom, Germany's digital industry association, found that nearly half of mid-sized German companies lacked a formal data deduplication policy.

What Comes Next — and What Organisations Should Do Now

The technical fix is well understood: perceptual hashing algorithms can identify visually identical or near-identical images even when file names, metadata or compression differ. Tools such as open-source implementations of the pHash library, or commercial solutions already used by media agencies in Hamburg and Munich, can scan large image repositories and flag duplicates for review within hours. The harder problem is governance — deciding who owns the decision to delete, archive or merge files, particularly when images carry legal, journalistic or historical significance.

Berlin's Digital Transformation Senate department is expected to publish updated data management guidelines before the end of the third quarter of 2026. Those guidelines will reportedly address file deduplication standards for the first time explicitly, requiring agencies to run deduplication audits ahead of any new major data migration. For district offices and cultural institutions that have already migrated data without such checks, a remedial audit cycle is the logical next step — ideally before the next budget planning round, which for most Berlin Bezirke opens in September. The savings, modest district by district, compound quickly at city scale.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.