P077
Relaunching crypticproject.org: making the MTBC genetic and pDST datasets collated by the CRyPTIC project more available to the wider community
P W Fowler(1) D Adlard(1) J Westhead(1) J Knaggs(1) H Thai(1) M Colpus(1) R Turner(1) T EA Peto(1) D W Crook(1) Z Iqbal(2) N A Ismail(3) T M Walker(4)
1:University of Oxford; 2:University of Bath; 3:University of the Witwatersrand; 4:Oxford University Clinical Research Unit
Some projects take on a life of their own; the Comprehensive Resistance Prediction for Tuberculosis: an International Consortium (CRyPTIC) project ran from 2017 to 2022 and was one of these. It collected tuberculosis samples with a bias towards MDR-TB. Each sample underwent short-read whole genome sequencing (WGS) and had minimum inhibitory concentrations to 13 antibiotics (including bedaquiline and delamanid) measured using a bespoke 96-well broth microdilution plate. The project also collated and curated existing datasets with WGS and/or pDST data. Following a data freeze in April 2020 several studies were published and this initial dataset was made available via an FTP site (https://ftp.ebi.ac.uk/pub/databases/cryptic/release_june2022/).
The project has, however, continued to amass samples: back in April 2020 a total of 41,130 samples with both WGS and pDST data were included. There are now 53,897, an increase of +31%. Improvements to the quality of the data have also been made: the bioinformatic pipeline has been rewritten (so it e.g. reports all minor alleles in all samples) and the accuracy of the reported MICs improved (by applying a machine learning model to images of the 96-well plates).
At this meeting we shall launch a revamped crypticproject.org to permit researchers to easily get this larger versioned dataset and describe our plans for future expansion through growing a widening network of contributors, including regular data releases, automated (API) access, machine learning competitions and adding samples that have been sequenced using long-read technologies. Finally, we will also describe our efforts to collate non-tuberculous mycobacterial (NTM) samples.
