P055
Benchmarking algorithms for species-level profiling of mycobacteria in shotgun metagenomic samples, and application to the distribution of known mycobacteria in soils
L B Harrison(1,2) J O Ahmed(1) F J Veyrier(1)
1:Centre Armand-Frappier Sante Biotechnologie - INRS; 2:McGill University Health Centre
Mycobacteria inhabit a variety of habitats including soils, water, dusts, and other niches. The distribution of known species has been described to a variable degree by both culture-dependent and culture-independent approaches. Shotgun metagenomics offers promise to delineate mycobacterial diversity of both known and unknown species, but significant questions concerning both wet and dry-lab methodologies remain. In the dry-lab, determining which known species of mycobacteria are present in a given sample of metagenomic sequencing reads is not trivial due to similarity and incompleteness of reference genomes, horizontal gene transfer, among other issues. Three new algorithms addressing these issues have been published to profile microbial communities at the species-level, including at low abundance, using reference databases: metapresence, YACTH and sylph. We benchmarked these tools on their ability to discriminate known species in the family Mycobacteriaceae using a curated reference genome database and in silico simulations to create artificial mycobacterial communities. Using precision/recall curves, we demonstrate that sylph and metapresence, when used with an alignment mapping quality filter, provides the best performance, with sylph maintaining high precision, but relatively lower recall/sensitivity at low abundance. Applying these methods to published global datasets of soil shotgun metagenomes (n = 1284), relatively few known species are detected per sample (sylph: mean±SD 0.24±0.73 known species per soil metagenome), and at overall low abundances. This likely reflects both the incompleteness of the reference database due to a high proportion of uncharacterized mycobacteria present in soils, and/or extraction protocols not optimized for mycobacteria.
