Berlin's Digital Archive Problem: The Hidden Scale of Duplicate Images Clogging City Systems
New internal audits reveal just how many redundant image files are draining storage budgets and slowing down Berlin's public sector digital infrastructure.
New internal audits reveal just how many redundant image files are draining storage budgets and slowing down Berlin's public sector digital infrastructure.

Berlin's public administration is sitting on a data storage crisis that few departments want to talk about openly. Internal reviews conducted across several Senatsverwaltung departments during the first half of 2026 found that duplicate image files — the same photo, scan, or graphic stored multiple times across different servers and platforms — account for an estimated 30 to 40 percent of total unstructured data held by the city's IT systems. That figure, circulating among procurement and IT planning staff, has quietly become a flashpoint in budget discussions at the Rotes Rathaus ahead of the autumn spending review.
The issue matters now for a specific reason: Berlin's Senate is finalising its digitisation roadmap under the Berlin Digital Strategy 2025–2030 framework, which commits the city to consolidating legacy IT infrastructure and reducing per-gigabyte storage costs. With cloud migration contracts due for renegotiation before the end of the year, every duplicated file translates directly into wasted expenditure. Storage costs for public sector cloud infrastructure in Germany currently run at roughly €0.02 to €0.05 per gigabyte per month depending on contract tier — not enormous in isolation, but significant when multiplied across petabyte-scale archives.
Two institutions illustrate the scale particularly well. The Landesarchiv Berlin, based on Eichborndamm in Reinickendorf, digitised roughly 1.2 million documents and photographs as part of an ongoing preservation programme that accelerated during the pandemic years. Staff there have flagged that upload workflows across multiple digitisation contractors produced significant duplication, with some image batches ingested two or three times due to inconsistent handover protocols. The archive has been working with the Kompetenzzentrum Digitalisierung Berlin — the city's central digitisation advisory body — to develop a deduplication standard, though that standard has not yet been formally adopted city-wide.
The Berlin city portal, berlin.de, operated by the Senatskanzlei, faces a related but distinct problem. Years of departmental uploads have left the content management backend carrying thousands of image assets where the same press photo or infographic exists under different filenames in different department folders. A 2025 technical assessment prepared for the portal's relaunch, referenced in Abgeordnetenhaus budget committee papers, described the image library as containing substantial redundancy that would need to be addressed before a planned system migration in late 2026.
Across the city's Bezirke, the picture is patchier. Mitte and Friedrichshain-Kreuzberg have both invested in district-level content management upgrades over the past two years, but neither has implemented automated duplicate detection as a standard part of their image workflows. Smaller districts with fewer IT staff are in a worse position still.
Quantifying the precise cost to Berlin is genuinely difficult without a city-wide audit, which has not been completed. But comparable exercises in other large German cities offer a reference point: Hamburg's data governance programme, launched in 2023 under the Dataport IT service provider, reported that deduplication of unstructured files — images, PDFs, and documents combined — freed up the equivalent of 18 percent of total storage capacity across participating agencies. Applied to Berlin's publicly stated IT infrastructure budget of approximately €280 million for 2025, even a conservative storage saving of 10 percent from deduplication could represent several million euros annually.
The tools to fix the problem are not expensive or exotic. Commercial deduplication software licences from vendors commonly used in German public sector procurement typically run between €15,000 and €80,000 per year depending on data volume, and open-source alternatives have been successfully deployed by several German Länder. The barrier in Berlin is not technology — it is governance: agreeing on which department owns the deduplication mandate, and who pays for the initial audit.
For practical purposes, the Senate's IT coordination office, based in Alexanderstraße in Mitte, is the body most likely to drive any city-wide solution. Procurement planning for the next IT framework contract cycle begins in September 2026. Organisations, archive professionals, and public sector IT managers with data held across Berlin's systems should begin internal audits now, before migration contracts lock in current storage volumes — and current inefficiencies — for another three to five years.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Berlin
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News