top of page

OR05

Long read sequencing reveals gene conversion as a diversity generating mechanism in Mycobacterium tuberculosis

M G Marin(1) M Harris(2) A Rosenthal(2) M R Farhat(1,3)

1:Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA; 2:Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA; 3:Pulmonary and Critical Care Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA

Whole genome sequencing has proven to be a useful tool for studying the evolution of Mycobacterium tuberculosis (Mtb). Due to limitations of short-read sequencing, researchers commonly exclude ~10% of the Mtb genome due to high sequence homology, repetitive sequence content, and a risk of false positive variant calls. To confidently study the evolution of these regions of the genome, we used a collection of 158 Mtb clinical isolates with both Illumina and long read sequencing to generate a set of complete genome assemblies. Leveraging this dataset, we identified 32 regions of the Mtb genome with a nucleotide diversity at least 12 fold higher (> 0.0025 SNV/bp) than the genome median (0.0002 SNV/bp). Notably, 100% (32/32) of these elevated nucleotide diversity regions overlapped with regions with homology elsewhere in the Mtb genome. This suggested that gene conversion, intrachromosomal recombination between homologs, is a likely diversity generating mechanism in these identified regions. To detect gene conversion events between homologs, we developed an approach that leverages the Gubbins algorithm for recombination detection and the minimap2 aligner for homology mapping within the Mtb genome. Of the 309 putative gene conversion events detected, 189 (61%) overlapped with genes coding for substrates of the ESX secretion systems (PE, PPE and Esx proteins), and 90 (29%) overlapped with genes in annotated REP13E12 repeat regions. Understanding these patterns of gene conversion has the potential to uncover a major driving force in the evolution of many substrates of the ESX secretion systems, a system implicated in Mtb virulence.

bottom of page