The Daily Johannesburg

Johannesburg news, every day

News

Johannesburg Organizations Lose Millions to Duplicate Digital Files Crisis

From municipal portals to township small businesses, duplicated digital assets are costing Johannesburg organisations real money and real time — and the data tells a stark story.

By Johannesburg News Desk · Published 4 July 2026, 9:25 pm

3 min read

Johannesburg Organizations Lose Millions to Duplicate Digital Files Crisis
Photo: U.S. Navy. Bureau of Medicine and Surgery / Public domain (Wikimedia Commons)

Duplicate images are clogging the digital infrastructure of Johannesburg's public and private sectors at a scale that most organisations have only recently begun to measure. Storage audits conducted across several Gauteng provincial government digital platforms in the first quarter of 2026 found that redundant image files — the same photograph or graphic saved multiple times under different filenames — routinely account for between 30 and 45 percent of total media library storage, according to IT procurement figures reviewed by The Daily Johannesburg. That translates, in practical terms, into wasted server capacity, slower website load times, and ballooning cloud storage invoices.

The issue sits at an uncomfortable intersection of rapid digitisation and poor data governance. When the City of Johannesburg accelerated its e-services rollout through the Joburg Connect portal on Loveday Street over the past three years, content teams uploaded assets at speed, with no systematic deduplication protocol in place. The result: by some internal estimates, the city's public-facing digital content management system contains tens of thousands of image files, a significant portion of which are duplicates or near-duplicates created during rushed migration from legacy systems.

What the Numbers Actually Show

The cost is not abstract. Commercial cloud storage in South Africa — primarily through providers operating data centres in Midrand and Johannesburg's northern corridor — is priced at roughly R0.23 per gigabyte per month for standard-tier object storage, according to current published rates from local hyperscale resellers. For a mid-sized organisation holding 500 gigabytes of media assets, eliminating even a third of that through deduplication saves close to R400 a month. Across a full year, that figure climbs past R4,700 — before factoring in bandwidth costs, which duplicate delivery compounds further.

Small businesses in Soweto's Vilakazi Street retail corridor and the Rosebank Mall precinct face a version of the same problem at smaller scale. Many of these enterprises manage their own e-commerce and social media presence, often relying on WhatsApp Business catalogues and Shopify storefronts maintained by a single person. Industry surveys of digital service providers operating in the Braamfontein creative economy suggest that the average small-business media library doubles in size every 14 months, largely through unmanaged duplication rather than genuinely new content creation.

Metrorail's Johannesburg division, which has been undergoing a widely watched reform programme along the Naledi and Mabopane corridors, hit a concrete example of the problem in late 2025 when its communications team attempted to consolidate five years of station photography ahead of a public-facing infrastructure campaign. The deduplication process — using automated perceptual hashing tools that identify visually identical images regardless of filename — reduced the working archive by 38 percent before a single image was manually reviewed.

Why Deduplication Is Now Urgent

The pressure to act is coming from two directions simultaneously. First, the Protection of Personal Information Act continues to drive organisations toward formal data audits, and image libraries frequently contain photos of identifiable individuals — staff, beneficiaries, commuters — that must be catalogued and, in some cases, deleted. You cannot delete what you cannot find, and you cannot find what has been saved 11 times under different names. Second, artificial intelligence tools now being piloted by companies in the Sandton Central precinct for marketing automation require clean, tagged, non-redundant image datasets to function reliably. Garbage in, garbage out applies with particular force to generative and retrieval-based AI systems.

The practical path forward for Johannesburg organisations is not especially expensive, but it does require deliberate policy. Perceptual hashing tools such as open-source libraries built on the pHash algorithm can be deployed on existing infrastructure. Several ICT firms based in the Rosebank and Illovo business nodes now offer deduplication-as-a-service contracts priced from around R1,500 per month for small to medium libraries. The more important step is governance: establishing a clear upload protocol so that new duplicates stop accumulating while old ones are cleared. For Joburg's digitising public sector, that means policy before procurement — a sequence the city has not always managed to get right.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Johannesburg

This article was produced by the The Daily Johannesburg editorial desk and covers news in Johannesburg. See our editorial standards for how we use AI.

The Daily Johannesburg brief

The day's Johannesburg news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Johannesburg and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Johannesburg news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Johannesburg and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Johannesburg

More in News

Enjoyed this story? Get tomorrow's briefing free.