Berlin's public archive system is in the middle of a mass cleanup. As of this week, the Landesarchiv Berlin — the state archive headquartered on Eichborndamm in Reinickendorf — confirmed it is actively running automated deduplication tools across roughly 2.3 million digitised image files, a project that has been in preparation since early 2025 but reached operational scale only in late June 2026.
The push matters now because Berlin's digital infrastructure budget is under pressure. The SPD-led Senate has been trimming discretionary spending lines ahead of the autumn fiscal review, and IT departments across Senatsverwaltungen have been told to demonstrate efficiency savings. Redundant image storage — where the same photograph appears under multiple catalogue entries, sometimes with conflicting metadata — has long been flagged as a waste, but it took a formal internal audit completed in May to put a number on the problem. According to the Landesarchiv's published project brief, duplicate or near-duplicate images account for an estimated 18 percent of total storage consumption in its primary digital repository.
What the Audit Found, and Where It Went Wrong
The issue is not unique to the Landesarchiv. The Stadtmuseum Berlin, which manages collections across sites including the Ephraim-Palais in Mitte and the Märkisches Museum near Märkisches Ufer, has been grappling with the same structural problem since it migrated to a centralised asset management platform in 2023. Duplicate entries multiplied during that migration, when images were imported from at least four legacy systems, each with different naming conventions. Staff at the Stadtmuseum have been manually reviewing flagged duplicates since March, a process the institution described in a June newsletter as ongoing.
The deduplication software now being deployed across Landesarchiv servers uses perceptual hashing — a technique that identifies visually identical or near-identical images even when file names, formats or resolutions differ. This is significant because many of the duplicates are not exact byte-for-byte copies. They are scans made at different resolutions, or the same historical photograph cropped slightly differently for different catalogue entries. A standard file-comparison tool would miss them entirely. The Landesarchiv's project brief estimates that roughly 400,000 individual image files will be flagged for human review by the end of July.
Storage costs in enterprise-grade archival systems run approximately €0.04 to €0.08 per gigabyte per month for cold-tier cloud storage, and Berlin's public archives collectively hold tens of terabytes of image data. Even modest deduplication gains translate into meaningful recurring savings. The project brief cites a target of reducing total image storage volume by at least 12 percent by the end of Q3 2026.
What Happens Next for Researchers and the Public
For historians, journalists and members of the public who use Berlin's online image portals — including the searchable interface at berlin.de and the Deutsche Digitale Bibliothek, which aggregates holdings from institutions across Germany — the short-term effect is disruption. Some catalogue entries have already been temporarily taken offline while metadata is reconciled. The Landesarchiv has posted a notice on its website advising researchers that search results for certain keyword categories, particularly images tagged to pre-1945 Berlin districts including Spandau and Tempelhof, may be incomplete until the review is finished.
The longer-term payoff, if the project stays on schedule, is a cleaner, faster search experience. Deduplication should also reduce the risk of the same photograph being published under conflicting rights designations — a legal headache that surfaced twice in 2024 when images licensed as public domain through one catalogue entry were simultaneously listed as rights-restricted under a duplicate entry.
Researchers with time-sensitive requests are being advised to contact the Landesarchiv reading room on Eichborndamm directly by telephone rather than relying on the online catalogue until at least the end of August. The Stadtmuseum Berlin has said its own cleanup should be complete by mid-September, in time for the autumn exhibition season. Whether the two institutions will ultimately share a unified, deduplicated image infrastructure remains an open question that the Senate's cultural affairs directorate has not yet publicly answered.