They develop a new computer tool to investigate the complexity of the genome

Investigation

They develop a new computer tool to investigate the complexity of the genome

The computer program developed by I2SysBio allows the discovery of new transcripts that were not in the genome databases. Credits: Pixabay

A team from the Institute of Integrative Systems Biology (UV-CSIC) has published in Nature Methods its own software to analyze data obtained by long-read sequencing of the genome. This system makes it possible to discover new RNA molecules and assign them a function in the creation of tissues. This deepens the knowledge of the formation of the organism and its diseases.

The complexity of an organism emerges from its genome, the book that contains the instructions of its DNA for life. The method for reading this book – sequencing – has evolved towards reading increasingly longer fragments of the genome. In this field, a research group led by the Institute of Integrative Systems Biology (I²SysBio), a joint center of the University of Valencia (UV) and the Higher Council for Scientific Research (CSIC), has improved its own computer program capable of discovering new transcripts –RNA molecules to synthesize proteins and create tissues– from their sequencing with long-read instruments; and assign them a function in the formation of the organism. This is what has been published by Nature Methods. genome sequencing. Compared to short fragment reading, which analyzes about 200 nucleotides, long read methods can obtain reads 100 times longer, leaving fewer gaps in the genome information to fill using bioinformatics tools. This was one of the reasons why Nature Methods itself considered it 'Method of the Year 2022'.

A few years earlier, in 2018, researcher Ana Conesa, then at the University of Florida, developed a computer program called SQANTI to analyze the information that was extracted through these reading methods. long. Now, his research team at I²SysBio has published a substantial improvement to this software that can be freely used in the main commercial systems that employ long read sequencing, Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT).

“Long read techniques analyze better the complexity of human transcripts and transcriptome,” says Conesa. This identifies the portion of the genome that is read in each cell to give rise to tissues and organs. Thus, a single gene can give rise to a great diversity of transcripts, through small changes in the structure of the RNA it encodes, and with them proteins with different cellular functions. "Short read sequencing cannot solve this puzzle. Long read better reconstructs the functional complexity of the human transcriptome, and this is key to studying certain diseases, especially neurological diseases and cancer," says the CSIC researcher.

Better understand the complexity of the organism and diseases

The version published now –SQANTI3– solves some previous problems derived from RNA degradation and introduces notable improvements. The program is capable of discovering new transcripts that were not in the genome databases used by these computer programs. In addition, through Artificial Intelligence techniques, the software can assign functional information to the new transcript, “something essential to understand the functional complexity of the organism and the diseases,” highlights Conesa.

To develop this computer program, the Garnatxa computing cluster of the I²SysBio, which has 15 computing nodes capable of offering 950 parallel computing threads. In addition, the Gene Expression Genomics group led by Ana Conesa at I²SysBio participates in ELIXIR, one of the strategic infrastructures for the European Strategic Forum on Research Infrastructures (ESFRI) that allows life sciences laboratories across Europe to share and store their data.

The University of Florida and Pacific Biosciences have collaborated in the development of SQANTI3.

Reference:

SQANTI3: curation of long-read transcriptomes for accurate identification of known and novel isoforms. Nature Methods (2024). Pardo-Palacios, F. J., Arzalluz-Luque, A., Kondratova, L. et al. https://doi.org/10.1038/s41592-024-02229-2

Communications

They develop a new computer tool to investigate the complexity of the genome

They develop a new computer tool to investigate the complexity of the genome

Share on social networks