Tens of thousands of duplicate digital images are clogging the storage systems of Berlin's public cultural institutions, driving up licensing costs and slowing down archivists who are already stretched thin. The problem has moved from an internal IT headache to a genuine policy conversation, with officials at the Senatsverwaltung für Kultur und Gesellschaftlichen Zusammenhalt now pushing member institutions to adopt standardised deduplication protocols before the end of 2026.
The timing is not accidental. Berlin's coalition government, led by the SPD, committed in its 2025 governing agreement to accelerating the digitisation of public cultural assets. That push has pumped new images into already overloaded systems faster than archivists can audit them. The result: redundant files that occupy expensive server space, create confusion for researchers pulling from public portals, and — in several documented cases — lead institutions to license replacement images from commercial stock providers when usable originals already exist somewhere deeper in the same database.
What Experts and Officials Are Saying
The Stadtmuseum Berlin, which manages the photography collections across multiple sites including the Märkisches Museum on Köllnischer Park and the Ephraim-Palais in the Nikolaiviertel, has been among the most vocal institutions acknowledging the problem. Its digital collections team has described the challenge in internal documents shared at a May 2026 working group convened by the Landesarchiv Berlin on Eichborndamm in Reinickendorf. Those documents, reviewed by The Daily Berlin, describe a collection environment where the same image can appear under three or four different file names, catalogued independently by different departments over a decade of uncoordinated digitisation efforts.
Researchers at the Technische Universität Berlin's computer science faculty have been advising several Bezirk-level archives on perceptual hashing — a technique that identifies visually identical or near-identical images even when file names, formats, or metadata differ. The approach is already used commercially by platforms handling millions of uploads per day, but its application to public cultural heritage archives in Germany has lagged. A pilot programme at the Bezirksamt Mitte, launched in March 2026 and covering roughly 140,000 catalogue entries, identified a duplication rate of approximately 18 percent — meaning nearly one in five stored images was a functional copy of something already in the system.
That 18 percent figure, while not yet independently verified across all Berlin institutions, has become the working reference point in policy discussions. Officials at the Senatsverwaltung have cited it when arguing for a centralised image asset management system, the procurement process for which is expected to begin in the fourth quarter of 2026. Estimated first-year licensing and implementation costs for such a system are in the range of €400,000 to €600,000, according to budget outlines seen by this newspaper, though final figures depend on how many institutions ultimately participate.
The Practical Stakes for Berlin's Cultural Sector
The duplication problem is not purely administrative. When institutions cannot quickly locate a usable image from their own holdings, the default response is often to purchase a replacement from commercial providers such as Getty Images or Alamy. For smaller Bezirk archives operating on tight budgets — Neukölln's Stadtarchiv, for instance, works with an annual digitisation budget well under €100,000 — repeated licensing fees for images they theoretically already own represent a real drain.
The Berliner Morgenpost reported in June 2026 that at least three public institutions had independently licensed the same historical photograph of Potsdamer Platz within a 14-month period, unaware that two usable versions were already held in the Landesarchiv's digital stacks. Officials have not disputed that account.
The working group's next meeting is scheduled for September 2026 at the Landesarchiv. Institutions have been asked to submit self-assessments of their duplication exposure by August 15. Archivists advising the process say the September session is where any shared procurement decision will effectively be made — meaning the summer months are when institutions still have a window to flag problems, request resources, or push for technical standards that reflect the realities of their specific collections.