Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Digital Archives Are Full of Duplicates — and the Numbers Are Staggering

A quiet crisis in how the city stores and manages image data is costing public institutions millions and slowing down everything from housing permits to cultural heritage projects.

By Berlin News Desk · Published 4 July 2026, 9:16 pm

3 min read

Berlin's Digital Archives Are Full of Duplicates — and the Numbers Are Staggering
Photo: Mach, Edmund von, 1870-1927 / Public domain (Wikimedia Commons)
Wird übersetzt…

Berlin's public sector is sitting on a storage problem measured in terabytes and euros. Across municipal databases, from the Senatsverwaltung für Stadtentwicklung to the Stadtbibliothek Berlin's digital archive on Breite Straße, duplicate image files have quietly colonised hard drives and cloud servers for years — redundant photographs, scanned documents, and planning visuals stored two, three, sometimes four times over. An internal audit framework published by the Technologiestiftung Berlin in early 2025 flagged that duplicate digital assets account for an estimated 30 to 40 percent of storage overhead in mid-sized German public institutions. Berlin, with its sprawling network of Bezirksämter and cultural bodies, sits squarely in that bracket.

The timing matters because Berlin's administration is mid-way through its Digitalisierungsstrategie 2030, the coalition's flagship push to move planning, permitting, and cultural cataloguing online. That push is expensive — the Senate earmarked roughly 280 million euros for digital infrastructure across the 2024-2026 budget cycle. Every gigabyte wasted on redundant image files is a direct drag on that investment, and project managers across Mitte and Friedrichshain-Kreuzberg have begun raising the issue formally with the city's central IT body, the ITDZ Berlin.

The Scale of the Problem in Berlin's Own Systems

Put concrete numbers on it and the picture sharpens. The ITDZ Berlin manages data infrastructure for more than 80,000 public employees across the city's twelve districts. Industry benchmarks for enterprise environments — drawn from studies by the Fraunhofer-Institut für Offene Kommunikationssysteme, based at Kaiserin-Augusta-Allee in Charlottenburg — suggest that automated deduplication tools typically recover between 20 and 60 percent of used storage capacity in image-heavy databases. For an organisation operating at Berlin's scale, even the lower end of that range translates to hundreds of terabytes and six-figure annual savings in cloud hosting fees alone.

The Stadtmuseum Berlin, which manages photographic collections spanning the city's history from the Märkisches Museum in Mitte to the Ephraim-Palais, began a deduplication pilot in late 2024. The project, run in partnership with the Zuse-Institut Berlin on Takustraße in Dahlem, targeted approximately 1.2 million digitised image files. Early results indicated that around 18 percent of those files were exact or near-exact duplicates — a figure that surprised archivists who had assumed manual cataloguing had kept redundancy low. At roughly 4 megabytes per image, that represents close to 860 gigabytes of recoverable space from a single institution's collection.

The housing sector compounds the issue. Berlin's Wohnungsamt offices — handling rent cap documentation, building permits, and Milieuschutz applications across districts including Neukölln and Pankow — generate thousands of scanned property photographs each month. Under current workflow, the same facade image can enter the system from a field inspector, a legal clerk, and an automated upload from the applicant portal, with no deduplication check between any of them. A standardised hash-matching protocol, already in use by the Bundesarchiv in Koblenz since 2023, would catch those duplicates at the point of upload rather than years later during a storage audit.

What Comes Next for Berlin's Digital Housekeeping

The ITDZ Berlin is expected to publish updated technical standards for image asset management before the end of the third quarter of 2026. Those standards are likely to mandate perceptual hashing — a technique that identifies visually similar images even when file names or metadata differ — across all Senate-connected databases. The Zuse-Institut Berlin pilot is the most closely watched proof of concept for that rollout.

For Berliners dealing day-to-day with slow permit processing or patchy access to the city's online cultural archives, the practical upshot is straightforward: faster retrieval, lower error rates, and municipal IT budgets that stretch further. The Digitalisierungsstrategie 2030 was always going to be judged on delivery speed as much as ambition. Getting the data clean is the unglamorous prerequisite for everything else — and the numbers now make ignoring it politically uncomfortable.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.