Across Berlin's network of public archives, libraries, and municipal databases, a single photograph can exist in dozens of identical or near-identical copies — different file names, same pixels. It is a problem that sounds mundane until you look at the scale. According to a procurement review conducted by the Senatsverwaltung für Digitalisierung und Verwaltungsmodernisierung and published in March 2026, duplicate image files account for an estimated 34 percent of total storage consumption across the city's shared government infrastructure.
That figure translates directly into money. The city currently pays for roughly 4.2 petabytes of active cloud and on-premise storage across its administrative units. At Berlin's contracted rate with its primary data centre partner in Adlershof — the technology park in Treptow-Köpenick that houses several of the city's server operations — redundant data is costing the Senate an estimated €1.4 million per year in unnecessary storage overhead. The Senatsverwaltung has not disputed that estimate in any public statement.
The problem did not materialise overnight. Berlin spent much of the period between 2018 and 2024 rapidly digitising physical records under the E-Government-Gesetz Berlin, the city's digital governance law, pushing tens of thousands of scanned documents and photographs into centralised systems without robust deduplication protocols in place. Departments uploaded independently. Version control was inconsistent. A photograph from a 2019 construction permit application in Mitte might now sit in seven separate folders across three different administrative platforms, each copy treated by the system as a unique asset.
Where the Redundancy Concentrates
The worst affected areas are not obscure back-office functions. The Landesarchiv Berlin, based on Eichborndamm in Reinickendorf, holds digitised historical photographs dating back to the early twentieth century. Internal assessments shared with city councillors in April 2026 indicated that duplicate detection software flagged more than 280,000 image pairs within the archive's public-facing digital collection alone — files that are either pixel-identical or differ only in resolution or metadata tagging. The Stadtbibliothek network, which operates across 77 branch locations including the Amerika-Gedenkbibliothek on Blücherplatz in Kreuzberg, faces a comparable challenge in its shared image catalogue used for event promotion and collection illustration.
The Berlin startup ecosystem has taken note. Several companies operating out of the Factory Berlin campus on Rheinsberger Straße in Mitte have developed perceptual hashing tools — software that assigns a unique fingerprint to each image based on visual content rather than file name — specifically targeting the public-sector deduplication market. One such company, Pictana GmbH, registered in Berlin-Mitte in January 2025, has already signed a pilot contract with a German federal agency, though not yet with the Berlin Senate.
What Deduplication Actually Requires
Fixing the problem is not simply a matter of running a script. Archivists argue, with justification, that two photographs that appear identical to an algorithm may carry different provenance metadata, licensing terms, or contextual annotations that make both copies independently valuable. The Landesarchiv, for instance, must comply with the Bundesarchivgesetz as well as Berlin's own Archivgesetz, both of which impose retention obligations that complicate straightforward deletion.
The Senate's digital office has proposed a phased deduplication programme, with a first audit of active administrative systems scheduled to conclude by December 2026. The projected cost of the full clean-up — covering automated scanning, human review for flagged edge cases, and system migration — is budgeted at €3.2 million over two years, a figure that appears in the 2026 Nachtragshaushalt approved in May. Proponents argue the investment pays for itself within three years through reduced storage costs and faster search performance across city platforms.
For anyone dealing with Berlin's public records systems in the meantime — journalists submitting freedom-of-information requests, researchers using the Landesarchiv's online portal, or developers building on open municipal data — the practical advice is straightforward: expect search results to surface multiple versions of the same image, treat metadata as unreliable for dating or sourcing purposes, and file specific requests by document reference number rather than keyword where possible. The Senate's digital portal on berlin.de lists reference contacts for each archive by department. The duplication problem is real, it is expensive, and the city has now at least put a number on it.