Berlin's Senate Department for Culture and Social Cohesion confirmed earlier this year that the city's flagship digitisation programme, running across the Zentral- und Landesbibliothek Berlin on Breite Straße and the Stadtmuseum Berlin network, had identified tens of thousands of duplicate image files accumulated over more than a decade of scanning campaigns. The problem is not cosmetic. Duplicate entries clog storage servers, distort search results, and quietly drain budgets that Berlin's SPD-led coalition has been under pressure to justify since the 2025 austerity round.
The issue has sharpened across Europe's major cultural capitals over the past eighteen months as institutions that rushed digitisation during the pandemic years now confront the downstream mess. Grants from the European Commission's Horizon Europe framework, which funded rapid scanning of holdings between 2020 and 2023, came with speed incentives rather than deduplication requirements. The result, across cities from Amsterdam to Warsaw, is digital archives stuffed with near-identical scans of the same photograph, poster, or map — sometimes filed under three different catalogue identifiers.
What Berlin Is Actually Doing
The ZLB launched a structured deduplication project in January 2026, contracting the Fraunhofer Institute for Telecommunications — which operates a major applied-research facility in Berlin-Charlottenburg on Einsteinufer — to run perceptual hashing algorithms across approximately 1.4 million image assets. Perceptual hashing compares images by visual fingerprint rather than file name, catching duplicates that were re-scanned, re-cropped, or saved in different formats. The contract value has not been made public, but comparable Fraunhofer institutional projects of this scope have typically run between €180,000 and €400,000, according to published tender records from similar German federal-state digitisation initiatives.
The Stadtmuseum Berlin, whose holdings span sites including the Ephraim-Palais in the Nikolaiviertel and the Märkisches Museum near Märkisches Ufer, is running a parallel but less automated process. Curators there are cross-referencing image metadata manually for a pilot collection of around 90,000 items relating to pre-war Berlin street photography — a slower method, but one that the institution argues is more reliable for historically ambiguous material where two nearly identical photographs may in fact document different moments.
How Other Cities Stack Up
Berlin's dual-track approach — automated for bulk holdings, manual for sensitive or historically complex collections — puts it roughly in line with Amsterdam's Stadsarchief, which began a similar programme in 2024 using open-source software developed by the Dutch Digital Heritage Network. Amsterdam reportedly cleared around 200,000 duplicate records from its public image portal in the first six months of that project, according to the network's published progress report from March 2025.
London's Victoria and Albert Museum and the British Library have both spoken publicly about deduplication challenges, though neither has announced a city-wide coordinated programme comparable to what Berlin and Amsterdam are running. Vienna's Wienbibliothek im Rathaus completed a deduplication sweep of its digital photograph collection in 2023, reducing its publicly searchable image database by roughly 12 percent — a figure the library published in its 2023 annual report. By that benchmark, Berlin's ZLB project, if it achieves similar results across 1.4 million assets, could retire well over 150,000 redundant files before the end of 2026.
Warsaw's National Digital Archive, Narodowe Archiwum Cyfrowe, has taken a different path entirely, embedding deduplication checks directly into its ingestion pipeline so that duplicates are blocked at upload rather than cleaned up retrospectively. Archivists familiar with the field consider that upstream model the gold standard, though it requires institutional discipline and software infrastructure that many older European collections built before roughly 2018 simply do not have.
For Berliners who use the ZLB's digital portal or the Stadtmuseum's online collections for research — whether tracing family history in the Scheunenviertel or pulling historical images for a Mitte planning application — the practical upshot is that search results should become meaningfully cleaner by autumn 2026, when the Fraunhofer contract is due to deliver its first full deduplication pass. The Senate Department has indicated it plans to publish a progress report before the end of the third quarter. Whether the city then moves toward Warsaw's upstream model, or settles for periodic retrospective sweeps, will depend heavily on how much of next year's culture budget survives the coalition's next spending review.