Molecular epidemiology of Mycobacterium tuberculosis across three distinct geographic sites in South Africa

J Infect Dis. 2025 Jun 17:jiaf326. doi: 10.1093/infdis/jiaf326. Online ahead of print.

Abstract

Background: Whole genome sequence (WGS) data can generate insights about Mycobacterium tuberculosis (Mtb) transmission. We used WGS and linked epidemiology data from a recent randomized trial to characterize Mtb relatedness across three geographically distinct South African sites.

Methods: We sequenced culture isolates from participants with culture-positive TB in the Kharituwe study, which evaluated household contact investigation strategies in one urban and two rural sites. We adapted a previous bioinformatic pipeline to clean, extract, and filter Mtb reads, perform reference alignment, calculate single nucleotide polymorphism (SNP) distances between isolates, and group isolates into clusters linked by recent transmission, based on three SNP-based cutoffs. Sequence data were linked to individual data on demographics and risk factors. We analyzed clustering across and within study sites and used log binomial regression to assess characteristics associated with clustering.

Results: At a cutoff of 12 SNPs, 213 out of 714 sequenced isolates passing quality control filters were clustered. While only 3 out of 45 pairs included participants from different sites, the majority of clusters with ≥4 participants included representation from at least 2 sites. Expanding to a 20-SNP cutoff revealed a large cluster containing 10% of isolates, with urban/rural representation mirroring that of all the isolates (61% urban, 39% rural). Participants from the urban site, TB household contacts, and participants reporting a history of incarceration were more likely to be in a cluster.

Conclusions: Observed clustering and strain diversity across sites indicate the presence of multiple ongoing and geographically dispersed outbreaks in this setting.

Keywords: Genomics; South Africa; Spatial; Transmission; Tuberculosis.