GENOTUBE, a high-throughput tool for exploring the genetic diversity of pathogenic mycobacteria

A Le Meur(1) F Hak(1) R Z Eddine(2,3) C Sola(4,5) G Refregier(1)

1:Université Paris-Saclay, CNRS, AgroParisTech, Ecologie Systématique et Evolution, 91190, Gif-sur-Yvette, France; 2:Al Afak institute for health Sciences, Liban; 3:Laboratoire d’Optique et Biosciences (LOB), Ecole Polytechnique, Route de Saclay 91120, Palaiseau, France; 4:INSERM-Université Paris-Cité, IAME Laboratory, UMR1137; 5:Université Paris-Saclay, 91190 Saint-Aubin, France

More than 100,000 Mycobacterium tuberculosis sequence reads (SRA) are publicly available. To make use of this resource, adapted analysis tools are required to mitigate analysis timeframe and storage space.

We developed a NGS data analysis pipeline to explore Mycobacterium tuberculosis strains’ polymorphisms, with an option to restrict the analysis to genes of interest. This pipeline is called genotube for GENe Oriented TUBErculosis pipeline. It has been developed with Nextflow process management software and is containerized to ease installation and provide reproducible results. Genotube is available on github. We benchmarked our tool with artificial genomes include all steps of evolution known to apply to this pathogen such as Single Nucleotide Variations, indels, IS6110 transposition, large deletions, as well as duplications.

GENOTUBE integrates the classic steps of a genomics pipeline (downloading, mapping, variant calling), as well as classic steps in M. tuberculosis all-in-one pipelines such as lineage and sub-lineage identification, in silico prediction of antibiotic resistance). In addition, GENOTUBE includes several phylogeny modules (tree construction, ancestral character inference, reversion and homoplasy identification). GENOTUBE proved more rapid, sensitive and specific than other tools such as Phyresse, MTBseq and TBProfiler. In addition, we identified that the risk of wrong inferences regarding SNVs is inflated at the border of deletions, IS6110 insertions or deletions, and duplications.

To sum up, GENOTUBE is a new tool to extract the diversity at loci of interest from large SRA samples of M. tuberculosis. It eases the detection of genetic associations. It can be adapted to other bacteria.

