Kostenlos abonnieren
The Daily Berlin

Berlin news, every day

News

Berlin's Digital Archives Wage War on Duplicate Images — And This Week Brought a Breakthrough

City institutions and startups are rolling out new tools to clean up Berlin's bloated digital image libraries, with real consequences for housing listings, public records, and cultural heritage.

By Berlin News Desk · Published 4 July 2026, 8:45 pm

3 min read

Berlin's Digital Archives Wage War on Duplicate Images — And This Week Brought a Breakthrough
Photo: Photo by Vinay Reddy Sama on Pexels
Wird übersetzt…

Berlin's chronic problem with duplicate digital images — clogging housing portals, municipal databases, and cultural archives alike — moved closer to a practical fix this week, as two separate initiatives announced new deployment timelines. The issue, unglamorous but expensive, has been costing the city's institutions storage budget and, in the rental market, actively misleading prospective tenants.

The timing matters. With Berlin's housing shortage still acute and the SPD-led Senate under pressure to improve transparency in the rental sector, duplicate images on platforms like ImmobilienScout24 have drawn fresh scrutiny. A single apartment in Neukölln or Prenzlauer Berg can appear with dozens of near-identical photographs — or worse, images recycled from previous tenants' leases — making it almost impossible for renters to assess what they're actually viewing. City consumer advocates have flagged the practice repeatedly, arguing it inflates the apparent availability of listings.

What Changed This Week

On Tuesday, the Zuse Institute Berlin — the research centre on Takustraße in Dahlem — confirmed it had completed a pilot of its perceptual hashing pipeline across the Landesarchiv Berlin's digitised photograph collection. The system, which generates a compact numerical fingerprint for each image and compares it against every other entry, identified roughly 12 percent of a 400,000-image test corpus as duplicates or near-duplicates. The Landesarchiv, which holds records stretching back to the 19th century, has been digitising materials since the early 2000s; without deduplication, successive scanning batches routinely produced redundant copies. Zuse Institute researchers are now preparing a full rollout across the archive's estimated 2.3 million digitised items.

Separately, the Berlin-based startup Pictave — operating out of a shared office on Oranienstraße in Kreuzberg — shipped an update to its commercial deduplication API on Wednesday. The company, founded in 2023, targets real estate portals and media companies. This week's update added support for German-language metadata stripping, which removes EXIF data that had previously allowed slightly re-exported copies of the same image to evade detection. Pictave's system is already integrated with at least one regional housing platform, according to the company's public changelog, though it has not named the client.

Why Storage Costs Are Finally Forcing Action

Cloud storage is not free, and Berlin's public institutions are feeling it. The Senate's IT service provider ITDZ Berlin, which manages infrastructure for most city agencies, reported in its 2025 annual review that unstructured data — a category that includes image files — had grown by more than 30 percent year-on-year across managed government systems. That growth rate, if sustained, raises procurement costs significantly at the next contract renewal cycle, expected in late 2027.

For private platforms, the economics are sharper still. Housing portals typically pay per-gigabyte fees on object storage, and a portal carrying hundreds of thousands of Berlin listings accumulates image data fast. Deduplication software, even at a per-image processing cost of fractions of a cent, pays back quickly at scale. The Pictave API is priced at €0.004 per image checked, according to its published rate card dated June 2026 — a figure low enough to make even aggressive retroactive cleaning of legacy databases financially defensible.

The Landesarchiv project also has a heritage dimension that the purely commercial cases lack. Duplicate records in a public archive are not just a storage nuisance; they introduce noise into provenance chains, which matters when historians or legal researchers need to verify an image's origin and date. A photograph of a Mitte street scene filed twice under different acquisition numbers, with slightly different metadata, creates ambiguity that can take archivists hours to untangle manually.

What happens next depends on how quickly the two efforts scale. The Zuse Institute team is aiming for a Landesarchiv-wide scan completed by the end of the third quarter. Pictave, meanwhile, is reportedly in conversations with at least one German media group — though no deal has been announced. For Berlin renters, the practical benefit is still months away at minimum: platform-side deduplication only helps if the portals choose to act on the flagged images rather than simply storing the results. Consumer groups including the Berliner Mieterverein have been pushing for mandatory image authenticity standards in rental listings; whether the Senate will legislate on that front before the next state election remains an open question.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Berlin

This article was produced by the The Daily Berlin editorial desk and covers news in Berlin. See our editorial standards for how we use AI.

The Daily Berlin brief

The day's Berlin news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Berlin news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Berlin and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Berlin

More in News

Enjoyed this story? Get tomorrow's briefing free.