Berlin's public institutions are sitting on a problem years in the making. Across municipal databases — from the Stadtbibliothek Berlin's digital holdings to the Landesarchiv on Eichborndamm in Reinickendorf — tens of thousands of duplicate images have accumulated, the result of fragmented digitisation drives, multiple overlapping scanning contracts, and departments that rarely talked to each other. Now, with the SPD-led Senate having flagged digital infrastructure reform as a budget priority for the 2026–2027 fiscal period, the question is no longer whether Berlin will act. It is how, how fast, and who pays.
The urgency is real. Berlin's open-data portal, daten.berlin.de, hosts image sets from at least a dozen separate city departments, many uploaded without cross-referencing. Duplicates inflate storage costs, slow down search functions, and — critically — create legal exposure when licensing metadata on the original file differs from the copy. For a city increasingly positioning itself as a European tech hub, with clusters of AI and data companies concentrated around Kreuzberg's Oranienstraße corridor and the Prenzlauer Berg startup belt, the inability to maintain clean, deduplicated public archives is an embarrassment with practical consequences.
What the Deduplication Decision Actually Involves
At its core, the city faces three distinct choices. First: which deduplication standard to adopt. Hash-based matching — comparing unique digital fingerprints of each image file — catches exact copies but misses near-duplicates created when the same photo is re-saved at different resolutions or with different colour profiles. Perceptual hashing, used by platforms including Getty Images and several European national archives, catches those variants but requires more computational power and a larger procurement contract.
Second: who governs the process. The Senatsverwaltung für Inneres und Digitales, which oversees Berlin's IT infrastructure, has been in discussions with the Kompetenzzentrum Geodateninfrastruktur Berlin-Brandenburg — the regional body responsible for geographic and visual data standards — about where responsibility should sit. Splitting governance between a city body and a regional one has complicated previous projects, including the delayed rollout of Berlin's unified geodata platform, which missed its original 2024 deadline.
Third: what happens to the duplicates once identified. Deletion sounds straightforward. It is not. Archivists at the Landesarchiv have long argued that what looks like a duplicate may carry different provenance metadata — a different photographer credit, a different acquisition date — that makes it independently valuable. A blanket delete policy risks destroying historical context. A case-by-case review, however, could take years and require staff the archive does not currently have.
The Budget Question and the Timeline Ahead
Storage is not cheap. Berlin's Senate approved a digitisation budget of roughly €47 million for 2025–2026, spread across multiple departments, but no dedicated line item has been confirmed for a citywide image deduplication programme. Comparable exercises in Hamburg and Leipzig have cost between €800,000 and €2 million depending on collection size, according to publicly available procurement records from those cities.
The practical calendar matters here. The Senate's IT steering committee is scheduled to meet in September 2026, when digital infrastructure priorities for the following budget cycle will be set. That meeting is effectively the last realistic decision point before any programme could receive funding early enough to begin work in 2027. Miss that window, and the problem carries into a third fiscal year with no resolution.
For the city's institutions, the next eight weeks are the ones that count. The Stadtbibliothek, whose main branch sits on Breite Straße in Mitte, has already begun an internal audit of its image holdings — a process expected to conclude by late August. The Landesarchiv, meanwhile, is preparing a position paper on provenance standards that will feed directly into the September discussions. How those two documents land, and whether the Senatsverwaltung für Inneres und Digitales treats them as the foundation for a unified policy or as competing briefs from rival institutions, will define what Berlin's digital image infrastructure looks like for the next decade.