Berlin's Landesarchiv and several Mitte-based tech firms moved this week to formalise a coordinated workflow for identifying and replacing duplicate images across the city's shared digital heritage platforms, a project that has been quietly building since a cross-institutional working group first convened in October 2025. The immediate trigger: a quality audit completed on June 30 found that roughly 34 percent of image assets held across the Berlin Open Data Portal contained at least one functional duplicate — degrading search results and inflating storage costs across municipal systems.
The timing matters. Berlin's Senate Department for Culture and Social Cohesion committed in its 2026 budget to digitising approximately 1.2 million physical documents and photographs by the end of next year, with a stated goal of making them freely accessible via the stadtmuseum.de infrastructure. Pouring new material into a system already clogged with redundant files would undermine the entire programme before it gains momentum. That is the argument city archivists have been making to the SPD-led coalition for months, and it appears to have finally landed.
What the Audit Actually Found
The June 30 audit, conducted by the Zentralen IT-Dienstleistungszentrum Berlin — the city's central IT authority, known as ITDZ Berlin — examined image metadata across three primary repositories: the Berlin Open Data Portal at daten.berlin.de, the Stadtmuseum's digital collection, and the photographic archive maintained by the Akademie der Künste on Hanseatenweg in Tiergarten. The 34 percent duplication rate translated to an estimated 180 terabytes of redundant data, a figure the ITDZ Berlin report flagged as a direct cost driver given that the city pays for cloud storage on a tiered commercial contract reviewed annually each September.
Duplicates accumulate in predictable ways. When institutions migrate collections from older content management systems, images get re-ingested without cross-referencing existing records. The Stadtmuseum alone completed two such migrations between 2019 and 2023. Each time, thousands of already-catalogued photographs from districts like Prenzlauer Berg and Kreuzberg were uploaded again under new identifiers, creating parallel records that confuse both researchers and automated cataloguing tools.
This week, ITDZ Berlin and the Fraunhofer Institute for Telecommunications — whose Heinrich Hertz Institut operates a facility on Einsteinufer in Charlottenburg — agreed a shared technical protocol for perceptual hash comparison, a method that identifies visually identical or near-identical images even when file names and metadata differ. The protocol will be piloted on the Akademie der Künste holdings starting July 14, with results reviewed by a joint committee in late August.
What Comes Next for Researchers and the Public
For anyone who uses Berlin's public digital archives — historians working on Weimar-era Neukölln, journalists pulling photographs from the Cold War division of the city, or students accessing resources through the Humboldt-Universität library system — the practical effect of a clean, deduplicated image database is faster, more reliable search. Right now, a single search query on the Open Data Portal can surface the same photograph four or five times under different catalogue numbers, forcing researchers to manually verify which record is canonical.
The city plans to publish a public-facing dashboard by September 1 showing deduplication progress across participating institutions, modelled loosely on the transparency reporting already used for BVG's infrastructure investment tracker. Institutions that join the shared protocol will be able to flag images for human review rather than automatic deletion, an important safeguard given that some apparent duplicates are actually distinct versions — a 1960 press photograph reprinted with different cropping, for example, can have genuine archival value in both forms.
For now, the Fraunhofer pilot on the Einsteinufer campus is the one to watch. If the perceptual hash system performs as modelled — identifying duplicates with fewer than two percent false positives — the Senate Department has indicated it would extend the programme city-wide before the end of Q3 2026. That would put Berlin ahead of Hamburg and Munich, both of which have launched similar initiatives but have not yet moved past internal planning phases.