Berlin's public institutions collectively store an estimated tens of millions of digital image files across their servers — and a significant share of those files are exact or near-exact duplicates. That is the central finding driving a quiet but consequential reckoning inside the Senate Department for Culture and Social Cohesion, which oversees digitisation programs for city museums, archives, and libraries.
The issue has sharpened focus because the Senate's five-year digitisation roadmap, launched in 2023 under the Berliner Digitalisierungsstrategie, is now at its midpoint. Budget reviews due this autumn will force administrators to account for storage inefficiencies that have accumulated since institutions began mass-scanning physical collections in the early 2010s.
How Bad Is the Duplication Problem?
Storage audits conducted across comparable European municipal archives have found duplicate-image rates ranging from 18 to 40 percent of total image inventories, depending on how aggressively deduplication tools have been applied. Berlin's Landesarchiv, located on Eichborndamm in Reinickendorf, holds more than 1.2 million digitised photographic items. If the European average duplication rate applies, the institution may be carrying upward of 200,000 redundant files — each consuming server space charged to a public budget.
Object storage on commercial cloud infrastructure runs at roughly €20 to €25 per terabyte per month for European public-sector clients, based on published pricing from major providers active in Germany. A high-resolution archival scan typically weighs between 50 and 150 megabytes. At that range, even 100,000 duplicate files can represent a recurring annual cost of several thousand euros — modest in isolation, significant when multiplied across a dozen major institutions.
The Staatsbibliothek zu Berlin, whose main reading rooms sit on Potsdamer Straße in Tiergarten, runs one of the largest digitised periodical and map collections in German-speaking Europe. Librarians there have been piloting deduplication software since late 2024, cross-referencing file hashes and perceptual-similarity algorithms to flag redundant scans created when analogue collections were digitised more than once by separate project teams working without a unified metadata standard.
Why the Problem Grew — and What It Costs
The root cause is institutional fragmentation. Berlin's cultural digitisation work has historically been split between individual house budgets, meaning the Stadtmuseum Berlin on Poststraße in Mitte, the Berlinische Galerie in Kreuzberg, and smaller borough-level archives each built their own storage infrastructure and scanning workflows. Without a shared image registry, the same photograph — say, a 1960s street scene from Prenzlauer Berg — could be scanned independently by three separate institutions, each holding a full-resolution master and multiple derivative copies.
A unified deduplication pass across even the four largest participating institutions could, based on comparable programs in Hamburg and Vienna, recover storage equivalent to several hundred terabytes. Hamburg's Kulturbehörde reported in 2024 that a deduplication exercise across its museum servers freed roughly 340 terabytes of recoverable space. Berlin's collections are larger by most measures.
The financial stakes extend beyond raw storage. Digitisation grants from the federal Kultur Digital program, administered through the Kulturstaatsministerin, are partly assessed on the quality and uniqueness of digital inventories submitted. Duplicate images inflate apparent collection size, which can complicate grant applications and reporting requirements.
The practical upshot for Berliners is slower and sometimes unreliable search results on public portals like the Deutsche Digitale Bibliothek, where duplicate records for the same object clutter results and frustrate researchers. Staff at reading rooms in Friedrichshain and Charlottenburg report spending measurable time each week manually resolving duplicate catalogue entries for users.
The Senate Department is expected to publish procurement criteria for a city-wide image deduplication platform before the end of the third quarter of 2026. Institutions that begin internal audits now — cataloguing file counts, creation dates, and storage locations before an external system is imposed — will be better positioned to meet the new standards and, potentially, to recover budget that can be redirected toward acquiring new material rather than storing the same photograph twice.