top of page

P048

A next-generation LLM-interrogable mycobacterial knowledge base

K Dewaele(1) O Tzfadia(1) O Tzfadia()

1:Institute of Tropical Medicine Antwerp

Many experts foresee smarter-than-human AI in 2027. Already today, Large Language Models (LLMs) can process book volumes of information, can be chained in autonomous, agentic workflows, and perform reasoning tasks - properties expected to enable a revolution in biological hypothesis generation and testing. To leverage these capabilities for mycobacterial research, access to domain-specific knowledge is key. However, existing mycobacterial databases, such as TBDB and Mycobrowser, are defunct. We present a suite of tools to build and dynamically extend a next-generation, LLM-interrogable, mycobacterial knowledge base (KB). This KB is gene-centric, enabling the formulation and testing of hypotheses related to gene function and biological networks. It is constructed as a knowledge graph, enabling plug-and-play extension with novel data types. Its functionality includes, 1) interrogation of functional protein association databases (comprising co-expression, gene neighbourhood, and co-occurrence data), 2) augmentation of association networks with regulatory data, including transcriptional control and protein-level modulation, 3) labeling of genes and their interactions using an automated literature querying tool that retrieves, stores, and interprets abstracts and full-text open access papers, 4) the addition of functionally relevant genomic markers, including gene conservation, evolutionary patterns (dN/dS, regions-of-difference) and immune-pressure signatures (epitope quantity and conservation). Integrating these diverse data types, we constructed an interactive HTML network visualisation of 1,163 M. tuberculosis genes, demonstrating the utility and feasibility of this approach in enabling access to both humans and LLMs. In summary, we present a methodological framework for constructing an LLM-enhanced knowledge base, preparing for mycobacterial research in the AI era.

ESM Logo_White.png

Registered address:
c/o TREASURER
Matthias Merker
Parkallee 1
23845 Borstel
Germany

  • Facebook
  • LinkedIn
  • YouTube

© 2021 The European Society of Mycobacteriology

bottom of page