Berlin's publicly funded digital image archives contain tens of thousands of duplicate files, a problem that archivists, municipal IT managers and open-data advocates say has gone unaddressed long enough. The issue came to a head this spring when the Stadtmuseum Berlin, which manages collections across multiple sites including the Märkisches Museum on Köllnischer Park, acknowledged during an internal review that its digitisation drive — accelerated during the 2020–2022 pandemic closures — had produced significant redundancy across its servers. Administrators are now facing pressure to act before the next budget cycle closes in September 2026.
The problem is not cosmetic. Duplicate image files clog storage infrastructure, slow database queries, and create version-control headaches that make it harder for researchers, journalists and the public to trust that what they are viewing is the authoritative record. Berlin's cultural institutions have been digitising at pace since the Senate's 2019 Digitalstrategie set targets for public access to heritage materials, and the speed of that push — welcome in principle — left little room for deduplication protocols to keep up.
What Experts and Officials Are Saying
Specialists working in the city's cultural-data sector are not mincing words. Staff at the Zentralinstitut für Kunstgeschichte, which coordinates art-historical databases used by Berlin institutions among others, have pointed to the absence of a shared metadata standard as the root cause. Without a common identifier attached to each image at the point of capture, the same photograph of, say, a 19th-century Kreuzberg tenement façade can be saved dozens of times under different filenames across different departments with no automated system flagging the overlap.
The Wikimedia Deutschland office on Tempelhofer Ufer has been vocal on the civic dimension. The organisation, which facilitates the upload of public-domain Berlin heritage images to Wikimedia Commons, says its volunteer editors routinely encounter duplicate uploads from institutional partners — a workflow problem that wastes contributor time and muddies provenance records. The organisation has previously called for institutions to adopt the International Image Interoperability Framework, known as IIIF, as a baseline standard before contributing to shared repositories.
At the Senate Department for Culture and Social Cohesion — the Senatsverwaltung für Kultur und gesellschaftlichen Zusammenhalt — officials have acknowledged the issue in general terms in budget hearings held in May 2026, without committing to a specific remediation timeline. The department's digital unit is understood to be scoping a pilot deduplication project, though no tender has yet been published in the official procurement journal, the Berliner Ausschreibungsblatt.
Costs and the Push for a Technical Fix
Storage is not free. Cloud and hybrid storage contracts for Berlin's public cultural institutions are estimated by sector analysts to run to several million euros annually, though the Senate has not published a consolidated figure. Even a conservative estimate that duplicate files account for 15 to 20 percent of total image-archive volume — a range cited informally by database administrators in similar European municipal contexts — suggests meaningful savings are available. The city of Amsterdam reduced its Stadsarchief storage overhead by roughly 18 percent after a deduplication exercise completed in 2023, a benchmark Berlin's IT planners are aware of.
The Freie Universität Berlin's computer science faculty has offered to partner with the Stadtmuseum on a perceptual-hashing pilot — a technique that identifies visually identical or near-identical images even when file names and metadata differ. A working group is expected to meet at the FU's Dahlem campus before the summer recess ends in mid-August 2026. The approach is already in use at the British Library and the Bibliothèque nationale de France, giving Berlin administrators a proven template rather than a speculative one.
For institutions and researchers who rely on these archives, the practical advice from specialists is straightforward: flag duplicates when you find them, document the original source metadata, and push your institution to adopt a persistent identifier system before the next digitisation contract is signed. The Senate's next digital-strategy review is scheduled for late 2026 — and pressure from the Abgeordnetenhaus budget committee suggests that this time, deduplication will need to be more than a footnote.