Berlin's Duplicate Image Problem: The Numbers Driving a Digital Storage Crisis
City agencies and tech firms are sitting on millions of redundant digital files — and the cost of doing nothing is climbing fast.
City agencies and tech firms are sitting on millions of redundant digital files — and the cost of doing nothing is climbing fast.

Berlin's public sector and its sprawling startup ecosystem are collectively storing billions of duplicate images across servers, cloud platforms and legacy databases — and the scale of the redundancy problem is only now becoming measurable. A working paper circulated among Senatsverwaltung IT departments this spring put the share of duplicate or near-duplicate image files in municipal digital archives at roughly 34 percent of total stored visual content, a figure that translates directly into wasted expenditure on server infrastructure and cloud licensing.
The timing matters. Berlin's SPD-led coalition has been pressing hard on digital modernisation as part of its 2026 administrative reform agenda, and every euro spent storing redundant data is a euro unavailable for the BVG fleet expansion or the subsidised housing programs the Senate has been defending against opposition criticism. Storage is not glamorous politics, but it is expensive infrastructure: enterprise cloud storage in Germany currently runs between €0.02 and €0.05 per gigabyte per month depending on contract tier, and municipal image archives measured in petabytes make duplication a genuine budget line, not a rounding error.
The problem concentrates in specific workflows. Berlin's Bezirksamt offices — district administrations spread across all twelve boroughs — each operate semi-autonomous document management systems that were never fully integrated when the city digitalised its planning and permitting processes between 2018 and 2022. Lichtenberg and Marzahn-Hellersdorf, both heavy users of digital building permit applications, have been flagged internally as districts where the same architectural renderings and property photographs get uploaded independently by applicants, district staff and third-party surveyors, tripling storage demand for a single permit file.
The tech sector adds its own layer. Startup accelerators concentrated around Prenzlauer Berg's Schönhauser Allee corridor and the EUREF Campus in Schöneberg house dozens of early-stage companies whose developers routinely neglect deduplication hygiene in product databases. A 2025 audit by the Berlin Digital Infrastructure Initiative — a public-private body that advises the Senate Chancellery — found that among 47 surveyed Berlin-based startups, the median company stored the same product or user-generated image in 2.8 separate locations within its own infrastructure, a figure that climbed to 4.1 copies among companies that had migrated at least once between cloud providers.
Put concrete numbers to it and the picture sharpens. If a mid-sized Berlin tech firm with a 500-terabyte image database is storing 34 percent of that content as duplicates, it is paying for roughly 170 terabytes of unnecessary storage every month. At standard AWS Frankfurt or Google Cloud Europe-West3 pricing — both of which maintain major Frankfurt infrastructure serving Berlin clients — that excess alone can represent between €3,400 and €8,500 in monthly cloud spend. Annualised, that is between €40,000 and €100,000 per company, before factoring in data transfer and API call costs multiplied by redundant retrieval.
The municipal picture is harder to price precisely because the Senatsverwaltung für Inneres und Sport, which oversees city IT procurement, does not publish granular storage cost data. But the spring working paper estimated that eliminating confirmed duplicates across just five pilot Bezirksamt offices could free between 12 and 18 percent of contracted cloud storage capacity — capacity that is currently being renewed at public expense under contracts signed before 2023, when cloud pricing was lower.
Automated deduplication tools have existed for years, and several — including open-source options like DupeGuru and commercial solutions integrated into platforms such as Cloudinary — are already in use at some Berlin institutions. The Zentralbibliothek at Tempelhof-Schöneberg, for instance, began a systematic deduplication pass on its digitised photograph collection in late 2024 as part of a broader metadata standardisation project.
For both public agencies and private firms, the practical path forward starts with an inventory. Before any deduplication tool is deployed, IT teams need a hash-based audit — generating a unique fingerprint for each stored file — to establish exactly how many genuine duplicates exist versus near-duplicates that differ by compression or resolution. The Senate's digital reform office has indicated that a city-wide framework recommendation is expected before the end of Q3 2026. Companies that wait for that guidance risk paying preventable storage bills through at least the end of the fiscal year.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Berlin
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News