top of page


Machine learning approaches for the analysis of electrical signals from Oxford Nanopore Technologies to detect pharmaco-resistance in Mycobacterium tuberculosis

A Zinola(1) F Di Marco(1) A Spitaleri(1) S Battaglia(1) A M Cabibbe(1) D M Cirillo(1)

1:San Raffaele Scientific Institute

Contemporary diagnostic kits predominantly employ short-read analysis to identify drug resistance in Mycobacterium tuberculosis (Mt). Our study explores alternative methods analyzing raw electrical signals (ES) from Nanopore Technology. This approach bypasses the base-calling step, enhancing efficiency by avoiding time-consuming processes and improving results’ accuracy.

The study examined 104 Mt isolates using Nanopore sequencing after amplification employing Deeplex kit. These isolates underwent phenotypic testing against Rifampicin and Isoniazid. ES were extracted from fast5 files and different approaches have been tested to clean and optimize input size and noise-signal ratio. Two distinct neural network (NN) models were developed: the first aimed at identifying ES mapping on genes among the entirety of Nanopore-generated outputs, and the second designed to establish connections between ES from a sample and the observed phenotypes.

The former NN was developed to identify genes from all the reads. Various architectures, employing in different combinations Convolutional 1D, Long Short Term Memory, Fully Connected, Batch Normalization, and Dropout, were implemented. Despite these efforts, none of the models achieved an accuracy exceeding 65%. The second NN, a PointNet-like NN, aimed to identify resistance in the samples, and employed wavelets decomposition for signal cleaning, however, the highest accuracy attained for this task was 50% for Rifampicin and 70% for Isoniazid.

These findings underscore the challenges in accurately discerning genomic features and resistance patterns in Mt isolates using the implemented NN models. Further optimization and exploration of alternative methodologies may be crucial for enhancing classification performance in such complex datasets.

bottom of page