Machine learning algorithm brings long-read sequencing to the clinic microbiologystudy

Machine learning algorithm brings long-read sequencing to the clinic
Overview of SAVANA. Credit: Nature Methods (2025). DOI: 10.1038/s41592-025-02708-0

Long-read sequencing technologies analyze long, continuous stretches of DNA. These methods have the potential to improve researchers’ ability to detect complex genetic alterations in cancer genomes. However, the complex structure of cancer genomes means that standard analysis tools, including existing methods specifically developed to analyze long-read sequencing data, often fall short, leading to false-positive results and unreliable interpretations of the data.

These misleading results can compromise our understanding of how tumors evolve, respond to treatment, and ultimately how patients are diagnosed and treated.

To address this challenge, researchers developed SAVANA, a new algorithm which they describe in the journal Nature Methods.

SAVANA uses machine learning to accurately identify structural variants—large genomic alterations such as insertions, deletions, duplications, or rearrangements—and the resulting copy number aberrations in cancer genomes—using long-read sequencing data.

It is important to have the right tool for the job. For example, you can eat soup with a fork, but the result is not as effective as using a spoon. SAVANA, like a spoon, is tailored for the task and designed to efficiently deliver reliable results.

This algorithm was developed and tested across 99 human tumor samples by researchers at EMBL’s European Bioinformatics Institute (EMBL-EBI) and the R&D laboratory of Genomics England, in collaboration with clinical partners at University College London (UCL), the Royal National Orthopedic Hospital (RNOH), Instituto de Medicina Molecular João Lobo Antunes, and Boston Children’s Hospital.

“Because other analysis tools are not developed to account for the particularities of cancer genomics data, they often pick up false positives that could lead to incorrect clinical and biological interpretations,” said Isidro Cortes-Ciriano, Group Leader at EMBL-EBI.

“SAVANA changes this. By training the algorithm directly on long-read sequencing data from cancer samples, we created a new method that can tell the difference between true cancer-related genomic alterations and sequencing artifacts, thereby enabling us to elucidate the mutational processes underlying cancer using long-read sequencing with unprecedented resolution.”

Optimized for clinical use

“When we developed SAVANA, our focus was clear: create a tool sophisticated enough to characterize complex cancer genomes but practical enough for clinical use,” explained Hillary Elrick, former Predoctoral Fellow at EMBL-EBI and Postdoctoral Fellow at the Francis Crick Institute.

“As a result, SAVANA can accurately distinguish somatic structural variants, copy number aberrations, tumor purity, and ploidy—all key to understanding tumor biology and guiding clinical treatment decisions,” added Carolin Sauer, Postdoctoral Fellow at EMBL-EBI.

Its rapid analysis and robust error correction make SAVANA well suited for clinical use. The method was recently applied to study osteosarcoma, a rare and aggressive bone cancer that mostly affects young people, where it helped researchers uncover new genomic rearrangements, providing novel insights into how osteosarcoma evolves and progresses.

The team also compared SAVANA’s results from long-read data with Illumina sequencing of the same samples analyzed using a whole-genome sequencing data analysis pipeline used to deliver clinical reports.

The findings were highly consistent across technologies, demonstrating that SAVANA performs on par with current clinical standards while revealing additional cancer-relevant alterations.

“The capability to accurately detect structural variants is transformative for clinical diagnostics,” said Adrienne Flanagan, Professor at UCL, Consultant Histopathologist at RNOH.

“SAVANA could help us confidently identify genomic alterations relevant for diagnosis and prognosis. Ultimately, this means we would be better placed to deliver personalized treatments for cancer patients.”

UK investment in clinical genomics

The UK is investing significantly in genomic sequencing technologies as part of the NHS Genomic Medicine Service. This initiative is the first in the world to offer whole genome sequencing as part of routine care. By embedding genomics into everyday clinical practice, it aims to improve diagnostic accuracy and support personalized cancer treatments.

However, investments in clinical genomics will only achieve their intended impact if genomic data are interpreted accurately, and this relies on specialized analytical tools. Genomics England explored SAVANA’s use as part of its work looking at the clinical potential of long-read sequencing technology to support earlier, faster diagnosis of cancer.

“Using SAVANA will ensure clinicians receive accurate and reliable genomic data, enabling them to confidently integrate advanced genomic sequencing methods such as long-read sequencing into routine patient care,” said Greg Elgar, Director of Sequencing R&D at Genomics England.

SAVANA is also being deployed as part of nationwide initiatives, such as the UK Stratified Medicine Pediatrics project, co-led by Cortes-Ciriano. This project is focused on developing more efficacious and less toxic treatments for childhood cancers using advanced sequencing technologies to better understand tumor biology and monitor disease recurrence.

Additionally, SAVANA is being used in Societal, Ancestry, Molecular and Biological Analyses of Inequalities (SAMBAI), a project aimed at addressing cancer disparities in recent African heritage populations.

More information:
Hillary Elrick et al, SAVANA: reliable analysis of somatic structural variants and copy number aberrations using long-read sequencing, Nature Methods (2025). DOI: 10.1038/s41592-025-02708-0

Provided by
European Molecular Biology Laboratory


Citation:
Machine learning algorithm brings long-read sequencing to the clinic (2025, May 29)
retrieved 29 May 2025
from https://medicalxpress.com/news/2025-05-machine-algorithm-sequencing-clinic.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.



Source link

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top