Every week, thousands of duplicate image files accumulate silently across Johannesburg's public and private digital systems — bloating servers, inflating cloud bills, and quietly undermining efforts to modernise the city's creaking infrastructure. It is a problem most organisations discover only when the storage invoice arrives.
The timing matters. Johannesburg is in the middle of a push to digitise records across multiple city departments, including the Joburg Metropolitan Municipality's property database and the Passenger Rail Agency of South Africa's reformed Metrorail network. Both programmes depend on clean, well-organised image libraries — engineering photographs, identity documents, station inspection records — and both are vulnerable to the kind of data duplication that goes unmanaged for months at a time.
The Numbers Behind the Problem
Globally, research firm Gartner estimated in 2024 that duplicate and redundant data accounts for between 25 and 40 percent of total enterprise storage consumption. Applied to a mid-sized South African municipality running cloud infrastructure at roughly R18 to R22 per gigabyte per month on local providers such as Dimension Data or MTN Business Cloud, the arithmetic turns punishing fast. A department storing 10 terabytes of images — not unusual for a city the size of Johannesburg — could be paying for 3 to 4 terabytes of pure redundancy every billing cycle.
Small business owners feel it differently. In the Fordsburg commercial strip along Mint Road, traders who moved their product catalogues online during and after the Covid lockdowns often uploaded the same stock photographs multiple times across platforms like Takealot, their own WooCommerce sites, and WhatsApp Business catalogues. A single clothing wholesaler can end up holding the same product image in six or seven slightly different file sizes and names, none of them flagged as duplicates by basic hosting software.
The Sandton financial district presents a different dimension of the same issue. Financial services firms operating out of towers along West Street and Alice Lane Road routinely handle scanned compliance documents — FICA submissions, proof-of-address files, identity copies — that arrive in duplicate from multiple client-facing channels. The South African Financial Intelligence Centre's digital compliance requirements, updated in 2023, increased the volume of mandatory document uploads, but did not mandate deduplication standards at point of ingestion. That gap has left back-office storage teams managing redundant files manually.
What Johannesburg Organisations Are Doing About It
Automated deduplication software has been commercially available for years, but uptake among Johannesburg's public sector bodies has been slow. Tools such as dupeGuru, VisiPics, and enterprise-grade solutions from vendors including Veritas and Commvault can scan image libraries and flag or remove duplicates based on hash-matching — a process that compares the underlying data of each file rather than just the filename. A hash-match scan across a 1-terabyte library can typically complete in under two hours on standard server hardware.
The City of Johannesburg's Group Information and Communications Technology directorate has piloted cloud rationalisation exercises in previous financial years, targeting the Pikitup waste management records and the Joburg Water billing archive. Neither pilot has published formal deduplication savings figures, but cloud cost reduction featured in the 2025/26 municipal budget speech as a stated efficiency target.
For individual businesses, the practical fix is less complicated than it sounds. Running a free tool like dupeGuru across a local drive takes under an hour for most small traders. Setting file-naming conventions before uploading — using product codes rather than camera-assigned filenames like IMG_4872 — prevents most duplication from occurring in the first place. Google Drive's built-in storage analyser, accessible through any browser, flags obvious duplicates at no additional cost.
The broader lesson for Johannesburg's digitising economy is straightforward: storage is not free, and the cost of ignoring duplicate data compounds month by month. With load shedding pressures having already forced many businesses onto cloud infrastructure — removing the option of simply buying another physical hard drive — disciplined image management is no longer optional housekeeping. It is a line item that will keep showing up on the invoice until someone runs the scan.