top of page


Improving the benchmark of Mycobacterium tuberculosis variant calling protocols with in silico evolved genomes

A Le Meur(1) G Refrégier(1)

1:Université Paris - Saclay

Artificial genomes used to assess the performance of variant identification (benchmarking) are set up solely by introducing single nucleotide polymophisms and indels. However, the true evolutionary distance between strains is greater, as genomes undergo structural rearrangements. In Mycobacterium tuberculosis, the main structural variants are inserting sequence jumps, chromosomal duplications and large Regions of Deletions.

Using simplistic representation of genomes leads to overestimation and indiscrimination of sensitivity and recall for all variant calling pipelines. To 1) better understand how structural rearrangements impact the performance of alignment and variant calling 2) help the choice of variant calling pipelines, we wanted to build in silico evolved genomes with traceable genomic features that mimic the genomic evolution of M. tuberculosis lineages.

We provide here Maketube, a framework for building artificial genomes from a reference sequence. The artificial genomes are evolved according to a broad set of characteristics of true strains evolution including structural variants. We compared the properties of these genomes to a set of reference genomes, high quality assemblies and genomes built by other tools for benchmark.

Genomes evolved in silico shared numerous properties, such as a greater variability in the fold coverage when aligned to the reference, and such as variants with closer properties to real variants identified in natural genomes. This allows to discriminate between variant calling pipelines where artificial genomes usually used for benchmark fail to help.

bottom of page