Berlin's public archives are sitting on a problem years in the making. Across the Landesarchiv Berlin on Eichborndamm in Reinickendorf, the Stadtmuseum Berlin network, and the photographic holdings of the Senatsverwaltung für Kultur, digital cataloguers have identified hundreds of thousands of duplicate image files — multiple scans of the same negative, near-identical JPEGs uploaded during successive digitisation drives, and overlapping donations from private collectors. The question of what to do with them is no longer administrative housekeeping. It has become a live policy debate with real budget consequences.
Why now? The Berlin Senate approved a digital infrastructure review in March 2026, setting a September deadline for institutions to submit deduplication strategies as a condition of continued funding under the Digitales Berlin 2030 programme. Any archive that cannot demonstrate a credible plan risks losing its share of the programme's €14 million tranche, earmarked specifically for long-term digital preservation. That deadline is ten weeks away.
Two Camps, Two Approaches
The institutions are not aligned. The Landesarchiv has piloted a perceptual-hashing tool — software that generates a fingerprint for each image and flags near-matches — on roughly 40,000 files from its Cold War-era West Berlin collection. Archivists say the tool correctly identified duplicates at a high rate in testing, but flagged a significant number of false positives: images that looked identical to an algorithm but carried different provenance metadata, different dates of acquisition, or subtle differences in print quality that a conservator would consider historically significant. Deleting the wrong version of a photograph is not like deleting a spreadsheet. It cannot be undone.
The Stadtmuseum, which oversees collections across sites including the Ephraim-Palais in Mitte and the Märkisches Museum near Köllnischer Park, has taken a more cautious line. Its cataloguing team is working through a manual review protocol, cross-referencing duplicates against the Europeana aggregator database before any file is flagged for removal. The process is slower and more expensive in labour hours, but archivists there argue it is the only method that preserves contextual integrity. The two approaches reflect a genuine professional disagreement, not just a resource gap.
Berlin's startup sector has not stayed out of this. At least three companies based in the Prenzlauer Berg tech corridor — including firms operating out of the Factory Berlin campus on Rheinsberger Straße — have pitched AI-assisted deduplication services to city institutions in the past eighteen months. The commercial offer is faster and cheaper upfront, but contract terms typically transfer intellectual responsibility for deletion decisions to the institution itself, not the vendor. That liability question is unresolved in Berlin's current public procurement guidelines.
What the Senate Must Decide
Three decisions are now converging on a tight timeline. First, the Senatsverwaltung für Kultur must issue technical guidance before the end of July specifying which deduplication methods qualify under the Digitales Berlin 2030 framework — guidance that will effectively pick a winner between the algorithmic and manual camps. Second, individual institutions must decide whether to pool their deduplication work through a shared service model or proceed independently, a choice with significant cost implications given that shared infrastructure could reduce per-file processing costs. Third, the Berlin city council's cultural affairs committee, which holds its next scheduled session on 14 July at the Rotes Rathaus, is expected to hear testimony on whether the September deadline should be extended given the complexity of the false-positive problem.
Advocates for the archival community have pushed back on the deadline publicly, arguing that a rushed deduplication programme carried out under funding pressure is precisely the scenario most likely to produce irreversible losses. Their concern is not hypothetical: a comparable digitisation project at a major European municipal archive — Hamburg's Staatsarchiv — resulted in the permanent deletion of roughly 1,200 photographic files in 2019 after an automated deduplication tool was applied without sufficient human review, according to a post-incident report published by the institution at the time.
For Berliners who care about how their city's visual history survives into the next century, the coming weeks matter. The Senate's July guidance document is the hinge point. If it mandates a minimum human-review layer for any AI-assisted deduplication — as archivists at both the Landesarchiv and Stadtmuseum have recommended — the slower, safer approach wins by default. If it leaves method selection to individual institutions chasing a funding deadline, speed will likely win instead. The files at risk have been waiting decades. They can wait a little longer to be deleted correctly.