Berlin's sprawling network of public digital archives has a chronic clutter problem, and this week three city institutions moved to tackle it head-on. The Landesarchiv Berlin, Wikimedia Deutschland, and the municipal IT agency ITDZ Berlin announced the next phase of a coordinated duplicate-image replacement initiative on Tuesday, pushing to have a working automated detection system live across pilot databases before August 31.
The timing is not arbitrary. Germany's federal digitisation framework — the Onlinezugangsgesetz, or OZG — has set hard interoperability deadlines for state-level repositories. Berlin's public image libraries, which span everything from urban planning photography to cultural heritage scans, have been flagged in internal audit cycles as carrying significant redundancy loads that slow retrieval times and inflate storage costs on the city's shared cloud infrastructure.
What the Pilot Covers — and Where It's Running
The immediate focus is two collections: the Stadtmuseum Berlin's digitised photograph archive, headquartered on the Klosterstraße in Mitte, and the planning department's visual documentation stored under the Senatsverwaltung für Stadtentwicklung, Bauen und Wohnen. Both repositories have grown rapidly since 2020 as pandemic-era digitisation grants accelerated scanning projects that were never fully deduplicated before upload.
ITDZ Berlin is deploying a perceptual hashing system — technology that generates a compact fingerprint for each image to detect near-identical duplicates even when file names or metadata differ. This is not straightforward keyword matching. Two photographs of the Oberbaumbrücke taken seconds apart, uploaded by different departments under different file names, would traditionally be catalogued and stored as separate assets. The new pipeline flags them for human review before one is retired or cross-linked.
Wikimedia Deutschland's involvement centres on Commons, the open media repository that Berlin cultural institutions increasingly use as a distribution layer. Duplicate images uploaded there under separate licences or slightly different crop ratios have caused attribution confusion in at least a dozen Berlin Wikipedia articles over the past 18 months, according to project documentation shared publicly on the Wikimedia Germany project portal this week.
Why It Matters for the City's Broader Tech Ambitions
Berlin positions itself as Germany's startup and govtech capital. The Senate's Digital Strategy 2030 document, published in late 2024, commits the city to a clean, interoperable data layer by the end of the decade. Duplicate and orphaned image files are a relatively unglamorous corner of that ambition, but they carry real costs. Storage on government cloud contracts in Germany typically runs between 0.02 and 0.05 euros per gigabyte per month at scale — and the Landesarchiv alone holds more than 400 terabytes of digitised visual material, a figure the archive itself has cited in public budget submissions.
There is also a user-facing dimension. Residents searching Mierendorffplatz housing records in Charlottenburg, or researchers pulling Kreuzberg street documentation for planning disputes, run into search result noise when duplicate images surface under different metadata tags. The initiative aims to reduce that friction by the time the OZG compliance window closes at the end of 2026.
The ITDZ pilot will process an initial batch of roughly 80,000 images across the two test collections. Any file flagged as a probable duplicate is quarantined — not deleted — and routed to a review queue staffed by archivists from each institution. Deletion requires sign-off from the originating department. The conservative approach reflects lessons from a 2023 Frankfurt pilot, cited in the ITDZ project brief, where aggressive automated removal led to the permanent loss of several unique scans that had been miscategorised as duplicates.
For Berliners or researchers who regularly pull images from city repositories, the practical advice this week is straightforward: if you are working with downloads from the Stadtmuseum portal or the Senate planning database, treat your local copies carefully through August. Files currently visible in both systems may be consolidated, renamed, or redirected as the pilot runs. The Landesarchiv has posted a technical notice on its website recommending that institutional users log persistent URLs rather than file names for any ongoing research projects.