P66
The growing chaos of tuberculosis population genomics at the era of 'Big Data' sorting out the wheat from the chaff.
C Sola(1,2) G Senelle(3,4) M R Sahal(1,2) K La(2,6) T Billard-Pomares(7) J Marin(2,7) A Bridier-Nahmias(2) C Guyeux(3,4) G Refrégier(1,5) E Carbonnelle(2,7) E Cambau(2,6)
1:Université Sorbonne Paris Nord, INSERM, IAME, F-93017 Bobigny, France; 2:Université de Paris, INSERM, IAME, F-75006 Paris, France; 3:Université Bourgogne Franche-Comté (UBFC), Besançon, France; 4:FEMTO-ST Institute, UMR 6174 CNRS-Université Bourgogne Franche-Comté (UBFC),; 5:Ecologie Systématique Evolution, Université Paris-Saclay, CNRS, AgroParisTech, UMR ESE, 91405, Orsay, France; 6:AP-HP, GHU Nord site Bichat, Service de mycobactériologie spécialisée et de référence, Paris; 7:Service de microbiologie clinique, Hôpital Avicenne, 93017 Bobigny, France; Université Paris 13, IAME, Inserm, 93017 Bobigny, France
Mycobacterium tuberculosis complex (MTBC) has a population structure consisting of 9 human and animal lineages. The genomic diversity of clinical isolates within these lineages is a pathogenesis factor that affects virulence, transmissibility, host response and the emergence of antibiotic resistance. Hence it is important to develop improved systems for tracking and understanding the evolution of genomes. We present a new informatic platform for computational biology of MTBC, that uses a convenience sample from public/private SRAs, designated as "TB-Annotator", describing the structure of the MTBC population based on 16,000 representative genomes from 63 countries in the current version. This platform analyzes nucleotidic variants, the presence/absence of genes, regions of difference, detects the insertion sites of mobile genetic elements. The objective of TB-Annotator is to detect recent epidemiological links but also to reconstruct more distant spatio-temporal phylogeographical stories between historically-related clones as well as to perform GWAS studies. We compare the taxonomic labels previously described in recent reference studies and build a phylogenetic tree with RAxML; we characterize about 200 sublineages whose different namings are compared and fused when possible; we discuss hierarchical typing schemes within lineages and the informativeness of certain SNPs, that we analyze in detail; we also present new phylogeographical sublineages, for example within L5,L6,L4.5. We show that this informatic platform will allow: (1) local epidemiological monitoring, (2) improved understanding of the global and local history of tuberculosis and (3) in depth studies on the genetic selection mechanisms acting on MTBC genomes.
