Berlin's Landesarchiv, the city-state's central public records authority on Eichborndamm in Reinickendorf, is midway through a three-year digitisation overhaul that has exposed a problem familiar to archivists from Vienna to Seoul: duplicate images are quietly consuming server capacity, distorting search results and, in some cases, causing bureaucratic errors when the wrong version of a document photograph gets pulled from the database. The scale of the problem here became clear in early 2026, when an internal audit found that roughly one in five scanned images across the Landesarchiv's digital holdings was a duplicate or near-duplicate file.
The timing matters. Berlin's SPD-led Senate coalition committed in its 2024 coalition agreement to having all core public administrative records digitally accessible by 2028. That deadline is now under pressure. Duplicate files slow indexing, inflate storage contracts — the city renewed a data-infrastructure deal with a Frankfurt-based provider in March 2026 — and create compliance headaches under federal data-protection rules. When housing authorities in Mitte or Neukölln pull property records to adjudicate rent-cap disputes, returning multiple versions of the same cadastral photograph is not a minor inconvenience. It can delay decisions that directly affect tenants.
What Berlin Is Actually Doing
The Landesarchiv began piloting a perceptual-hashing deduplication system in January 2026, applied first to the photographic holdings of the Stadtmuseum Berlin, which manages collections across sites including the Märkisches Museum near Märkisches Ufer. Perceptual hashing assigns a fingerprint to each image based on visual content rather than file metadata, allowing near-identical scans — taken on different days or at slightly different resolutions — to be flagged and consolidated. The pilot covered roughly 80,000 images in its first phase. A second phase, expected to launch in September 2026, will extend the process to building-permit photography held by the Senatsverwaltung für Stadtentwicklung, Bauen und Wohnen.
The Berliner Senat has not published a standalone budget line for the deduplication work; it sits inside the broader digitisation envelope. Open-data advocates at the Berlin-based nonprofit Wikimedia Deutschland, which collaborates with public institutions on digital access projects, have publicly called for greater transparency about which datasets are being cleaned and when deduplicated archives will be fully searchable. The organisation operates from offices in Tempelhofer Ufer and has a formal cooperation agreement with several German state archives.
How Berlin Compares to Amsterdam, Vienna and Seoul
Amsterdam's Stadsarchief completed a comparable deduplication exercise across its photographic collections in 2023, reportedly reducing its image index by around 14 percent and cutting retrieval times perceptibly for public users accessing the archive's online portal. Vienna's Wiener Stadt- und Landesarchiv has taken a more cautious, manual-review approach, arguing that automated systems risk incorrectly flagging historically significant variant prints as duplicates — a concern that archivists in Berlin share but have chosen to manage through a human-review queue rather than a full pause on automation.
Seoul presents the sharpest contrast. The Seoul Metropolitan Government integrated AI-assisted deduplication directly into its Smart Seoul Data of Things platform beginning in 2022, covering not just static image archives but live feeds from municipal inspection cameras. The scale is different — Seoul's population of roughly 9.7 million generates a far larger daily administrative image load than Berlin's 3.7 million residents — but the underlying tooling is similar, and Seoul's published accuracy rate for its deduplication algorithm, cited in a 2025 city government report, stood at 97.3 percent. Berlin has not yet published equivalent accuracy benchmarks for its pilot.
For Berliners, the practical consequences will show up first in two places: the ease of searching digitised property and planning records online, which directly affects anyone navigating the city's complicated housing market, and the reliability of the Landesarchiv's public search portal, used heavily by researchers, journalists and community groups in neighbourhoods like Kreuzberg and Wedding tracing the built history of their streets. The Senat's digitisation team is expected to present an interim progress report to the Abgeordnetenhaus before the parliamentary summer recess ends in mid-August 2026. Until then, archivists are still working through the queue, one flagged image at a time.