Berlin's Digital Archives Waste Millions Storing Duplicate Images Across Agencies
From municipal databases to BVG's sprawling media library, redundant image files are quietly consuming server capacity and budget across Berlin's public sector.
From municipal databases to BVG's sprawling media library, redundant image files are quietly consuming server capacity and budget across Berlin's public sector.

Berlin's public institutions are sitting on a growing mountain of duplicate digital images — and the scale of the problem, measured in terabytes and procurement euros, is only now coming into focus. Across city departments, transit operators, and publicly funded cultural archives, the same photograph or graphic asset is often stored dozens of times under different filenames, inflating storage costs and slowing down digital workflows that administrators have spent years trying to modernise.
The timing matters. Berlin's SPD-led Senate has committed to a broad digitisation push across municipal services, with the Berliner Digitalagentur, the city's official digital transformation body based in Mitte, coordinating infrastructure spending through 2027. As that programme accelerates, the unsexy problem of duplicate file management is surfacing as a genuine line item — not just a housekeeping annoyance.
Studies of large public-sector digital asset management systems in comparable European cities have found duplicate image rates ranging from 18 to 35 percent of total stored files, depending on how aggressively deduplication tools are applied. Berlin's own municipal data infrastructure spans multiple legacy systems that do not communicate cleanly with one another, making the problem structurally worse than in cities that built unified content management platforms from scratch.
BVG, the public transport operator whose media and communications teams maintain tens of thousands of image assets — everything from U-Bahn line photography to accessibility campaign graphics — acknowledged in its 2025 annual report that digital asset rationalisation was among its operational efficiency targets for the current fiscal year. The operator runs a content library shared across departments at its headquarters near Potsdamer Platz. Without automated deduplication, teams routinely re-upload images that already exist in the system under earlier project folders, each copy consuming additional server allocation billed at commercial cloud rates.
At the Stadtmuseum Berlin, which administers digitised collections across multiple sites including the Märkisches Museum on Köllnischer Park, archivists have been working since January 2025 on a catalogue rationalisation project partly funded under the federal Digitale Kultur programme. Internally, the challenge of duplicate scans — the same historical photograph digitised at different resolutions by different teams over different grant cycles — is a known drag on the project's progress. Each unresolved duplicate requires a human decision about which version is canonical before metadata can be locked down and published to public-facing portals.
Storage is not free. Enterprise-grade cloud storage used by Berlin public bodies typically runs between €0.02 and €0.04 per gigabyte per month under standard procurement frameworks, according to publicly available pricing tiers from major providers operating in the EU data sovereignty tier. A municipal department holding 10 terabytes of image assets with a 25 percent duplication rate is paying for roughly 2.5 terabytes of files it does not need — a modest figure per department, but one that compounds across the dozens of agencies, Bezirksämter, and publicly funded cultural institutions that make up Berlin's administrative ecosystem.
The Berliner Digitalagentur has circulated internal guidance encouraging departments to adopt deduplication standards before migrating legacy data into new shared infrastructure. The guidance points to hash-based comparison tools — software that generates a unique fingerprint for each image file and flags matches regardless of filename — as the recommended technical approach. Adoption, however, remains uneven across the city's 12 Bezirke, with wealthier districts like Charlottenburg-Wilmersdorf generally further along than eastern districts still working through older server architectures.
For Berlin's growing startup and tech sector, concentrated in the Mitte and Friedrichshain-Kreuzberg corridors, the municipal drag matters because city procurement and partnership data flows into those companies through shared APIs and open data portals. Duplicate and inconsistent image metadata in public datasets degrades the quality of products built on top of them — a practical complaint that developers at Berlin's tech community hubs have raised in public consultations on open data quality.
The practical next step for any Berlin department is straightforward: run a deduplication audit before the next storage contract renewal, typically on annual cycles, and build hash-comparison into upload workflows rather than treating it as a cleanup task. The Berliner Digitalagentur has set a soft deadline of Q1 2027 for departments participating in the unified infrastructure rollout to meet minimum deduplication standards. That window is closing faster than many administrators appear to realise.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Berlin
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News