top of page


GPAS: evaluation of mycobacterial species identification of 7798 MGIT samples

T EA Peto(1) E Robinson(2) D W Crook(1) M Culpas(1) R Turner(1)

1:University of Oxford; 2:UKHSA Birmingham UK

The GPAS pipeline is a user-friendly, cloud-based bioinformatic service. It is designed to assemble, variant call and analyse mycobacterial whole genome sequences reporting species, resistance prediction and relatedness. 

Here we evaluated its performance in determining the species, sub-species and lineage of 10,000 consecutive mycobacterial samples obtained from MGIT cultures in the Public Health England Lab Birmingham using Illumina sequencing. 7798 FASTQ files were suitable for analysis. First, human reads were removed, quality checked and trimmed using FastP and filtered by Kraken 2. Only mycobacterial and unclassified reads were further processed by mapping competitively against a target of 179 mycobacteria with published reference genomes, thus, spanning the majority of known species. The reference genome with best coverage was classified as the species and used for subsequent genomic assembly and variant calling. In addition, species was also identified using Mykrobe which recognises unique species/subspecies/lineage determining sequences.

Comparison of competitive mapping and Mikrobe outputs were essentially concordant. 88 species were identified as follows: TB complex (3,370), intracellulare (1135), avium (925) abscessus (575) chelonae(375).  22 species were only identified once and 13 twice. The analysis was also able to determine the presence of mixed mycobacterial and non-mycobacterial infections (58 cases) and mixed non-mycobacterial cases (6 cases). The detection of TB in a sample was not limited bio-informatically but by laboratory cross contamination. The GPAS pipeline is able to process at least 10,000 samples a day and is suitable for clinical and public health users worldwide. 

bottom of page