top of page


Reference-free clustering as an epidemiological tool for Mycobacterium tuberculosis strain typing

A C Chilengue(1) D Whiley(1) M R Sananes(1) C J Meehan(1,2)

1:Nottingham Trent University; 2:Institute of Tropical Medicine, Antwerp, Belgium

Whole genome sequence (WGS) analysis of M. tuberculosis employing a 5-SNP cut-off is a robust tool for surveillance and detection of recent transmission events. This approach has shown many advantages in clinical and epidemiology studies. However, it requires significant computational resources, making it challenging to perform in many low-resource, high-incidence environments, where most tuberculosis cases occur. To address this problem, we explored reference-free tools for clustering genomes to make transmission tracking feasible in settings with limited computational resources.

We analysed a dataset of global clinical isolates from across the lineage diversity of M. tuberculosis and a local transmission dataset from Rwanda. We used PopPunk (Population Partitioning Using Nucleotide k-mers), Mash (Fast genome and metagenome distance estimation tool using MinHash) and SKA2 (Split K-mers Analysis), reference-free tools for population analysis and clustering genomes. We assessed each approach for accuracy in defining and recovering strain types (e.g. lineages and sub-lineages) and compared genome distance distributions with the standard SNP distance matrices to find correlations.

Our analysis revealed that reference-free tools have the potential to detect new M. tuberculosis strains and transmission clusters. These tools can delineate M. tuberculosis lineages efficiently, though they are not all consistently accurate across all sub-lineages. SKA2 allows for lineage-defining split k-mers distance cut-offs. However, the accuracy of reference-free distances for detecting recent transmission clusters requires further investigation to fully explore the utility of M. tuberculosis molecular epidemiology. Such advancements could significantly enhance WGS analysis of all lineages of M. tuberculosis, particularly in low-resource environments.


bottom of page