Berlin's public digital infrastructure is carrying a hidden weight. Across municipal databases, archive portals and cultural institutions, duplicate image files — identical or near-identical scans uploaded multiple times — are consuming server capacity, distorting search results and driving up storage costs that ultimately land on taxpayers. The problem is not new, but pressure to address it has sharpened in 2026 as the SPD-led Senate pushes a broader digitalisation agenda under the city's Smart City Berlin strategy.
The issue matters now partly because of scale. Berlin's Landesarchiv, based in Reinickendorf on Eichborndamm, manages millions of digitised records, maps and photographs. The Zentral- und Landesbibliothek Berlin, with its main reading rooms in Mitte and Kreuzberg, runs parallel digitisation pipelines. When the same historical photograph gets uploaded independently by two departments using different metadata tags, it does not simply take up double the space — it fragments the public record and makes cross-referencing nearly impossible for researchers. Staff at both institutions have been working under a shared digitisation framework since 2023, but synchronisation between their content management systems has lagged.
What the Experts Are Saying
Digital archivists and information scientists in the city have been vocal in recent months. Specialists at the Humboldt-Universität's Institut für Bibliotheks- und Informationswissenschaft have argued in professional forums that Berlin needs a city-wide deduplication protocol before it expands its digitisation budget further. The core technical argument is straightforward: without a unified hashing system — where each image file gets a unique digital fingerprint checked against a central registry before upload — redundancy is structurally inevitable. Retrofitting a solution after millions of files have already been ingested is significantly more expensive than building the standard in from the start.
The Wikimedia Deutschland office, located in Tempelhofer Ufer in Kreuzberg, has also entered the conversation. The organisation has a long-standing relationship with Berlin's public institutions through its GLAM (Galleries, Libraries, Archives and Museums) outreach programme, which encourages free-licence uploads to Wikimedia Commons. Staff there have pointed out that duplicated source files complicate Commons uploads and sometimes result in the same historical image appearing under contradictory licensing terms — a legal headache that can pull material offline entirely. Wikimedia Deutschland formally flagged the issue in a letter to the Senate Department for Culture and Social Cohesion in early 2026, though the contents of that letter have not been made public.
The Senate's Response — and What Comes Next
The Berlin Senate has acknowledged the problem within broader digitalisation discussions. The current five-year digital investment plan, running through 2028, allocates funds for infrastructure modernisation across public institutions, though the Senate has not broken out a specific budget line for deduplication work. The figure most often cited by technology consultants advising the city is that poorly managed digital storage inflates operational costs by roughly 20 to 30 percent over a five-year horizon — though those estimates come from industry benchmarks rather than Berlin-specific audits.
Practically speaking, institutions are not waiting for a top-down mandate. The Stadtmuseum Berlin, which oversees collections including the Märkisches Museum near Köllnischer Park in Mitte, began piloting an automated duplicate-detection tool in the second quarter of 2026 as part of a broader collections management upgrade. Early internal results, shared at a digitisation roundtable in May, reportedly identified thousands of redundant files within a single photographic collection — a finding that prompted renewed calls for coordination across the city's other major institutions.
For anyone who uses Berlin's public digital archives — researchers at the Freie Universität, journalists pulling historical images, or residents tracing family records — the practical upshot is slow search tools and inconsistent results that will persist until the city commits to unified standards. The next formal policy checkpoint is a Senate committee review of the Smart City Berlin strategy scheduled for September 2026. Advocates are pushing for deduplication standards to be written into procurement contracts for any new digitisation work signed after that date. Whether that happens will depend less on technical consensus, which largely already exists, and more on whether the Senate treats the issue as infrastructure — or as a footnote.