Berlin's public sector is sitting on a problem it can no longer ignore. Across municipal databases, from the Senatsverwaltung für Stadtentwicklung's planning portal to the BVG's internal communications archive, duplicate and misattributed images have accumulated over years of poorly coordinated digital migration. Now officials, archivists and technology consultants are openly disagreeing about who is responsible — and how fast the cleanup must happen.
The urgency is real. The Berlin Senate's ongoing digitalisation push, anchored in the Digitalisierungsstrategie Berlin 2030 program, has set a hard target of moving most citizen-facing services fully online by the end of 2027. Dirty data — including duplicate images clogging content management systems — is being cited by city IT departments as one of the practical blockers standing between ambition and execution.
A Problem Hiding in Plain Sight
The issue is not abstract. At the Stadtbibliothek Berlin on Breite Straße in Mitte, librarians managing the digital collections system have spent much of the past eighteen months working through a backlog of digitised photographs where the same image appears under multiple catalogue entries, sometimes with conflicting metadata. The problem is mirrored at the Landesarchiv Berlin in Reinickendorf, where archivists have flagged that automated bulk imports conducted between 2021 and 2023 introduced significant duplication across photographic collections.
Technology consultants working with Berlin's public institutions describe the core challenge as a structural one. Most systems were built without deduplication logic baked in from the start, meaning that every time a new content management platform was adopted — and Berlin's public sector has cycled through several — image libraries were simply migrated wholesale, duplicates included. The result, according to documents circulated within the Senate's IT coordination body, is databases where storage overhead attributable to duplicate files runs well above thirty percent in some departments.
Officials at the Senatskanzlei have pointed to the Digital Service Berlin, the city's in-house tech unit established in 2022 and headquartered on Karl-Marx-Allee in Friedrichshain, as the body best placed to lead a coordinated deduplication effort. The Digital Service has been quietly piloting an AI-assisted image matching tool since early 2026, with early tests run on a subset of around 40,000 images from the city's press photo archive.
Experts Push Back on a Top-Down Fix
Not everyone agrees that centralising the solution is the right call. Information scientists at the Humboldt-Universität zu Berlin, whose Institut für Bibliotheks- und Informationswissenschaft on Dorotheenstraße has studied public sector data governance in German federal states, argue that automated deduplication carries its own risks — particularly when applied to historical image archives where two nearly identical photographs may represent genuinely distinct documentary records.
The debate has drawn in the cultural sector too. The Stiftung Stadtmuseum Berlin, which oversees collections at the Märkisches Museum near Köllnischer Park, has raised concerns about any city-wide deduplication policy that does not include explicit carve-outs for heritage institutions. Museum officials have made clear, in written submissions to the Senate, that the evidentiary standards for deleting a file from a cultural archive must be higher than those applied to, say, a press office communications folder.
Costs are also part of the conversation. Storage pricing for enterprise-grade municipal cloud infrastructure has risen substantially since 2023. Reducing duplicate image volume by even twenty percent across the Senate's main departments could translate into meaningful annual savings on storage contracts, though the Senate has not published a precise figure publicly.
For now, the Digital Service Berlin is expected to present a formal deduplication framework to the Senate's digital coordination committee before the end of the third quarter of 2026. Heritage and library bodies have been invited to submit responses by August 15. Whether the final policy draws a clean line between bureaucratic efficiency and archival integrity will depend heavily on those coming weeks of negotiation — and on whether the institutions involved can agree on a shared definition of what, exactly, counts as a duplicate worth deleting.