Thousands of duplicate images are clogging the digital storage systems of Johannesburg's major public institutions, from the Johannesburg Metropolitan Municipality's planning directorate in Braamfontein to the photographic vaults of the Apartheid Museum in Atteridgeville Road, Gold Reef City. The problem is not new, but pressure is building to fix it — and the choices made before the end of 2026 will determine whether those archives become a usable public resource or an expensive, unsearchable mess.
The issue has sharpened because several city agencies are mid-contract on storage infrastructure. The City of Johannesburg's Information and Communications Technology department, which manages centralised digital repositories for multiple directorates, is due to review its cloud storage agreements in the third quarter of 2026. That review window is the clearest opportunity yet to implement deduplication protocols before new storage is commissioned at presumably higher rates — commercial cloud storage costs in South Africa have risen alongside the weakening rand, with enterprise-tier pricing from local providers such as Vodacom Business and BCX running at roughly R1,200 to R2,500 per terabyte per month depending on redundancy tier.
Why Duplicates Accumulate — and Why They're Hard to Kill
The root cause is institutional. Departments at the Joburg Roads Agency on Westgate send images to central servers. So does the Johannesburg Development Agency, whose project photographers document construction across Soweto and the inner city. Neither team historically checked whether a version of a given image already existed. Staff churn — itself significant given Joburg's documented vacancy rates across municipal departments — means institutional memory about file-naming conventions evaporates quickly. A single construction site on Eloff Street Extension might accumulate dozens of near-identical progress photographs tagged under four different project codes.
Cultural institutions face the same headache at a different scale. The Soweto-based Credo Mutwa Cultural Village and organisations digitising heritage materials under the South African Heritage Resources Agency's digitisation grants have flagged the problem internally: grant funding pays for scanning and uploading, but rarely for the unglamorous work of deduplication and metadata standardisation afterward. When grant cycles end, institutions are left with bloated, poorly indexed collections.
The standard technical fix is automated deduplication software — tools that compare image hash values and flag near-identical files for human review. Products used by South African media houses and universities include solutions from companies such as Datadobi and open-source platforms like PhotoDNA, adapted for archival rather than enforcement use. But software alone is not the answer. Someone has to decide which version of a duplicate is authoritative, who signs off on deletions, and what happens when an image exists in one department's system as a draft and another's as a published record.
The Decisions That Cannot Be Delayed
Three choices will define what comes next. First, institutions need a retention policy with legal standing — something most Joburg agencies currently lack for digital images specifically, as opposed to documents. The South African National Archives and Records Service published framework guidelines in 2021, but uptake at municipal level has been inconsistent. Second, the city's ICT department needs to decide whether deduplication happens before or after the Q3 storage contract renewal. Doing it before could meaningfully reduce the volume of data that needs migrating, cutting costs. Doing it after means paying to migrate redundant files and then cleaning up — a more expensive sequence. Third, cultural institutions applying for the next round of DSAC (Department of Sport, Arts and Culture) digitisation grants in the 2026/27 financial year need to include deduplication as a line item, or they will repeat the same cycle.
For ordinary Joburgers, the practical stakes are real. Journalists, researchers, urban planners working in Sandton's financial district, and historians documenting Alexandra township's transformation all rely on searchable public image databases. A 2024 audit conducted by the Wits School of Governance — referenced in their urban data governance curriculum materials — found that retrieval failure rates in municipal digital archives were a significant barrier to public accountability reporting. Cleaning up duplicates is not a technical footnote. It is the precondition for those archives to function at all. The clock on the storage contract review is already running.