Berlin's public digital infrastructure is sitting on a problem that nobody put on a ballot: thousands of duplicate photographs, scans and image files clogging the storage systems of municipal agencies, costing real money and slowing the retrieval of civic records that residents and journalists alike depend on. Archivists, data engineers and city administrators have begun speaking more openly about the issue in recent months, and their message is consistent — the longer it goes unaddressed, the more expensive the fix becomes.
The timing matters. The SPD-led Senate is under pressure to trim operational spending while simultaneously delivering on a digital modernisation agenda first outlined in the Berlin Digital Strategy of 2023. Storage inefficiency sits awkwardly in the middle of that tension. When every saved euro is a talking point and every tech investment is scrutinised, paying twice — or ten times — for the same image file is hard to defend publicly.
What the Experts Are Saying
Staff at the Landesarchiv Berlin, which holds millions of digitised historical records including photograph collections from the Weimar era, have noted in internal discussions that automated deduplication tools are not yet standard practice across all of the archive's ingest pipelines. The Landesarchiv, based on Eichborndamm in Reinickendorf, ingests files from dozens of city departments and partner institutions. Without consistent metadata tagging and hash-based duplicate detection, the same scan can arrive multiple times and be stored as a new file each time.
Technology consultants working with Berlin's city government — including firms contracted through the Senate Department for Digital Development and Work — point to a broader pattern across German federal states. A 2024 report from the Fraunhofer Institute for Open Communication Systems (FOKUS), which is headquartered in Charlottenburg on Kaiserin-Augusta-Allee, estimated that German public-sector organisations collectively waste between 15 and 20 percent of their cloud and on-premise storage capacity on redundant files, with image formats among the most frequently duplicated. Applied to Berlin's documented municipal IT expenditure, even a conservative estimate suggests hundreds of thousands of euros in avoidable annual costs.
Data officers at the Berlin Open Data portal, govdata.de's regional node, say the problem surfaces visibly when citizens or developers download datasets — the same photograph appears under different file names, tagged to different departments, with conflicting licensing metadata. That creates legal as well as technical headaches. Image rights management becomes nearly impossible when the same file has been registered by three different agencies with three different provenance records.
Pressure From Prenzlauer Berg to the Senate
Civic tech advocates, several of them affiliated with the Code for Germany network whose Berlin chapter meets regularly in Mitte's co-working spaces, have pushed the issue toward the political agenda for at least two years. They argue that the fix is not especially complicated in technical terms — perceptual hashing algorithms can flag near-duplicate images automatically, and open-source tools such as those used by the Wikimedia Foundation already handle this at scale. The obstacle, they say, is institutional will and cross-departmental coordination, both of which require political backing that has been slow to arrive.
For residents in Prenzlauer Berg or Kreuzberg who use Berlin's digital services — checking planning applications, accessing historical neighbourhood photographs, downloading city maps — the practical effect is sluggish load times and search results cluttered with near-identical images. BVG, the public transport operator, faced a version of this problem in 2022 when consolidating its internal asset management system ahead of a fleet documentation update; the deduplication process took three months longer than projected, according to publicly available project summaries.
What happens next depends largely on whether the Senate Department for Digital Development includes mandatory deduplication standards in the next iteration of its IT infrastructure guidelines, expected sometime before the end of 2026. Archivists and civic technologists alike are watching that process closely. In the meantime, institutions handling large image collections are advised to audit their storage pipelines now — running even a basic hash-comparison across existing files can surface the scale of the problem before any formal policy lands.