
In the last several years, large language models (LLMs) like ChatGPT and Bard have shown the world the astounding power of generative AI for language creation tools. However, some of the most exciting applications of this technology are happening in biology.
The UCSC Genome Browser has added two new datasets that leverage the power of generative AI and machine learning to interpret information about genetic variants and more rapidly assess which ones might be harmful to human health.
These datasets, from AlphaMissense and VarChat, are available as “tracks” on the two most widely used human reference genomes, hg38 and hg19, and are the first of many planned tracks of this kind.
Google Deepmind’s AlphaMissense is a powerful deep learning model that has been trained to predict which single amino acid variations in the human genome are likely to cause potentially pathogenic problems in protein folding. These predictions could contribute to identifying previously unknown disease-causing genes, and make it easier to diagnose rare genetic diseases.
By making these tracks available on the UCSC Genome Browser, the UCSC Genomics Institute hopes to facilitate research on genetic variants that disrupt protein function and allow clinicians to more quickly identify genetic variations that could be causing harm to their patients.
“This track uses AI to make sense of the vast amount of information in biological databases and would not be possible without the 25+ years of collected information by humans and enforced by journals, data collected by PDB, RefSeq, Genbank, and others,” said Maximilian Haeussler, director of the UC Santa Cruz Genome Browser.
“When most people think of generative AI they think it is all like ChatGPT, but this model is not about language at all.”
VarChat is a model produced by Italian genomics company enGenome that uses LLMs specifically to condense available scientific literature on genomic variants.
If a scientist or clinician is interested in finding out what is known about a variant of a specific gene and its potential impacts on human health, they can type that variant into VarChat and get a summary text that condenses all of the available scholarly articles on that variant.
The Browser track for VarChat shows how many scientific publications a variant was observed in, along with its corresponding gene and other identifying information.
The Browser also color-codes tracks based on how many scholarly papers have written about it, allowing researchers to instantly see how well-documented a variant is. The Browser team is currently working on adding more tracks that use large-language models to extract information from scientific papers, including one of their own.
Citation:
Newest Genome Browser features highlight the power of generative AI and machine learning for biology (2025, February 27)
retrieved 27 February 2025
from https://medicalxpress.com/news/2025-02-genome-browser-features-highlight-power.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.