Long-read sequencing (LRS) is a third-generation sequencing method that can sequence longer fragments of nucleotides compared to traditional short-read sequencing methods.
It generates genomic data by reading DNA or RNA fragments that are thousands of nucleotides long, usually between 10,000 to 100,000 base pairs at a time. These long reads are produced from native molecules which are directly extracted from a biological sample. The ability of long-read sequencing to analyze native molecules directly provides a more accurate representation of the original genomic material. This allows more accurate genome assembly and detection of complex genomic regions that are challenging to detect with short-read sequencing technologies.
In sequencing, reads refer to the sequences of nucleotides that are determined by a sequencing machine. The length of these reads depends on the sequencing platform and can be categorized into short reads and long reads. Traditional short-read sequencing (SRS) involves fragmenting DNA into short pieces, amplifying these fragments, and sequencing them. In contrast, long-read sequencing sequences longer pieces of DNA directly without the need for amplification. Pacific Biosciences’ Single Molecule Real-Time (SMRT) sequencing and Oxford Nanopore sequencing (ONT) are the pioneers in long-read sequencing technology.
Principle of Long-Read Sequencing
The main principle of LRS involves directly reading longer stretches of DNA or RNA molecules in a single sequencing pass without extensive fragmentation and amplification. Different long-read sequencing technologies work based on different principles. The principle of two main LRS technologies are:
- SMRT sequencing relies on detecting fluorescence emitted when a polymerase adds a specific nucleotide to a growing DNA strand in a tiny well. The polymerase is attached at the bottom of a tiny well and each nucleotide added to the strand emits a distinct fluorescent signal. These fluorescence signals are used to determine which base was added.
- In contrast, Oxford Nanopore sequencing works by detecting changes in the flow of ions as single-stranded DNA passes through tiny biological pores. Each nucleotide causes a specific change in the ion flow because of its unique resistance. These fluctuations are measured and converted into base calls, revealing the sequence of nucleotides in the DNA strand.
Types of Long-Read Sequencing
True Long-Read Sequencing
- True LRS directly reads native DNA or RNA molecules without computational reconstruction. These methods process entire DNA fragments in a single run.
- At present, the two main long-read sequencing technologies are Oxford Nanopore Sequencing and PacBio Sequencing (SMRT). These two platforms are regarded as true long-read sequencing technologies.
- In comparison, alternative methods either read short fragments and then assemble them into a longer sequence or process long fragments but assemble them as shorter reads.
Synthetic Long-Read Sequencing
- Synthetic long-read sequencing methods are a hybrid of short-read and long-read sequencing. It combines short-read sequencing with computational techniques to reconstruct long reads from shorter fragments.
- Synthetic LRS reconstructs long reads by linking short-read fragments using barcodes or other methods.
- Some of the synthetic long-read methods are linked-read technologies (10X Genomics), proximity ligation, and optical mapping.
- While synthetic long-read technologies are cost-effective and perform better than basic short-read sequencing in some areas, they are still limited by using short reads for assembly. So, they are less effective for structural variant detection and complete genome assembly compared to true long-read sequencing technologies.


Process of Long-Read Sequencing
1. Library Preparation
The first step is extracting DNA from the target organism. This step is important for obtaining accurate sequencing results. The extracted DNA is fragmented into larger pieces and sequencing adapters are added to the ends of these fragments. These adapters allow the binding of DNA to the sequencing platform and facilitate the sequencing process.
- In SMRT sequencing, the prepared DNA library is loaded onto SMRT cells which contain thousands of wells called zero-mode waveguides (ZMWs). A DNA polymerase enzyme is fixed at the bottom of each ZMW. Individual DNA molecules bind to the DNA polymerase in the ZMW and fluorescently-labeled nucleotides are incorporated into the DNA strand. The emitted fluorescence signals are detected in real time and converted into nucleotide sequences based on the intensity of the signals.
- In nanopore sequencing, the DNA library is passed through a tiny protein nanopore. As the DNA molecules move through the nanopore, they cause fluctuations in the ionic current due to the distinct electrical properties of each nucleotide. These fluctuations are detected as electrical signals, which are then processed to determine the DNA sequence. This method allows real-time sequencing of long DNA fragments without amplification.
- Continuous Long Reads (CLR): It is generated using SMRTbell templates with DNA inserts >30 kb in length. These large inserts allow for only single-pass sequencing of the template.
- High-Fidelity (HiFi) Reads: This is the most recent data type developed by PacBio. It is both long and highly accurate. These reads are produced using circular consensus sequencing (CCS) of smaller SMRTbell templates of 10-30 kb in length. The small size of the DNA insert allows multiple passes by the polymerase resulting in extremely long reads.
- Long Reads: ONT long reads are the most common type of reads generated by ONT sequencing. These standard ONT reads are around 10-100 kb in length.
- Ultra-Long Reads: These are specialized ONT reads generated from high molecular weight DNA. It has read lengths of >100 kb but has lower throughput than standard long reads.
- Long-read sequencing data provides more context for assembling the genome. This reduces ambiguity and error often associated with shorter fragments.
- LRS platforms like Oxford Nanopore allow real-time sequencing and provide rapid sequencing.
- It can accurately sequence repetitive DNA sequences and identify large-scale genomic mutations often linked to genetic disorders which are challenging for many other NGS technologies.
- Some LRS platforms can sequence RNA molecules directly without converting them to complementary DNA. This allows a more accurate understanding of transcriptomes.
- LRS can sequence native DNA and RNA without amplification, avoiding amplification bias and preserving base modifications like DNA methylation.
- Long-read sequencing improves the detection of structural variants such as large deletions, insertions, inversions, and duplications, which short reads often fail to identify.
- True LRS technologies like Nanopore sequencing have a compact design. These devices are portable and have been used even in remote locations.
- It is challenging to analyze long-read sequencing data. It requires specialized expertise and resources for processing and analysis. This increases the complexity and difficulty of interpreting long-read sequencing results.
- LRS is more prone to higher error rates because it sequences longer DNA fragments. This causes errors in base calling and inaccuracies in data interpretation.
- It has lower throughput compared to SRS. It generates fewer sequencing reads per run which means fewer samples can be sequenced in a given time.
- Sequencing long DNA fragments takes longer. This can delay results, especially in clinical settings where rapid diagnosis is essential.
- LRS is more expensive than SRS, particularly for large-scale projects.
- Long-read sequencing is useful in genome-wide variant identification and can detect clinically significant variants. It can cover larger genomic regions, including regions that are difficult to detect such as large insertions/deletions or repetitive regions.
- It has applications in targeted sequencing of complex and clinically relevant genomic regions that are difficult to sequence with short-read technologies.
- LRS is useful for studying epigenetic modifications such as DNA methylation across large genomic regions without amplification, maintaining native DNA content.
- It has applications in haplotype phasing. This involves distinguishing between chromosomal copies. This is particularly useful for understanding the genetic basis of diseases and identifying specific genetic variations. Long reads can provide the information required for haplotype phasing without relying on statistical inference, parental sequencing, or additional sample preparation.
- It is also useful in full-length transcript sequencing, allowing detailed analysis of isoforms, alternative splicing, and gene expression patterns.
- Amarasinghe, S. L., Su, S., Dong, X., Zappia, L., Ritchie, M. E., & Gouil, Q. (2020). Opportunities and challenges in long-read sequencing data analysis. Genome Biology, 21(1). https://doi.org/10.1186/s13059-020-1935-5
- Chauhan, T. (2024, February 9). What is Long-Read Sequencing? Retrieved from https://geneticeducation.co.in/what-is-long-read-sequencing/
- Davis, J. (2023, August 22). What is Long-Read Sequencing? News-Medical. Retrieved on December 22, 2024 from https://www.news-medical.net/life-sciences/What-is-Long-Read-Sequencing.aspx.
- Logsdon, G. A., Vollger, M. R., & Eichler, E. E. (2020). Long-read human genome sequencing and its applications. Nature Reviews Genetics, 21(10), 597–614. https://doi.org/10.1038/s41576-020-0236-x
- Long-Read Sequencing – CD Genomics. (n.d.). Retrieved from https://www.cd-genomics.com/long-read-sequencing.html
- Long-Read Sequencing Technology | For challenging genomes. (n.d.). Retrieved from https://www.illumina.com/science/technology/next-generation-sequencing/long-read-sequencing.html
- Mobley, I. (2024, June 17). Long-read sequencing vs short-read sequencing – Front Line Genomics. Retrieved from https://frontlinegenomics.com/long-read-sequencing-vs-short-read-sequencing/
- Pollard, M. O., Gurdasani, D., Mentzer, A. J., Porter, T., & Sandhu, M. S. (2018). Long reads: their purpose and place. Human Molecular Genetics, 27(R2), R234–R241. https://doi.org/10.1093/hmg/ddy177
- Sequencing 101: long-read sequencing. (2023, March 2). Retrieved from https://www.pacb.com/blog/long-read-sequencing/
- The complete overview of Long-Read Sequencing in 2024 – CD Genomics. (n.d.). Retrieved from https://www.cd-genomics.com/resource-complete-overview-long-read-sequencing.html
2. Sequencing
The prepared libraries are loaded onto the sequencing platform designed to process large DNA fragments associated with long-read sequencing.
3. Data Analysis
The raw signals generated during sequencing are processed to identify the base pairs and produce the initial sequencing data. These signals are translated into nucleotide sequences through a process called base calling. The generated long nucleotide sequences are mapped to a reference genome or used for de novo assembly. This helps to identify genetic variants by aligning the sequences with known genetic information or assembling them into a new genome. Detected variants are annotated to provide more detailed information about their location, function, and biological importance.


Long-Read Sequencing Data Types
Long-read sequencing technologies generate long reads that vary in length and accuracy.
PacBio Sequencing Data Types
ONT Sequencing Data Types
Advantages of Long-Read Sequencing
Limitations of Long-Read Sequencing
Applications of Long-Read Sequencing
Long-Read Sequencing vs. Short-Read Sequencing
Characteristics | Long-Read Sequencing | Short-Read Sequencing |
Read Length | It can sequence longer DNA fragments from 10,000 to 100,000 bp. | This method sequences shorter fragments of 50-300 bp. |
Sequencing Time | The sequencing time for LRS is longer as it deals with longer fragments. | It takes a shorter time as it reads shorter sequences. |
Genome Assembly | It simplifies genome assembly by providing a broader context. | Short reads lack sufficient context which makes it difficult to assemble. |
Structural Variant Detection | It can detect large-scale variations, repetitive regions, and complex regions. | It has limited ability to identify large insertions, deletions, and repetitive sequences. |
Error Rates | It can have comparatively higher error rates in individual reads. | SRS like Illumina typically has lower error rates. |
Data analysis | It requires specialized software and algorithms for processing and analysis of long reads. | SRS is widely used and supported by various platforms and software. Shorter reads are also easier and faster to process. |
Applications | It is ideal for whole-genome sequencing, structural variant detection, sequencing complex genomes, and clinical research. | It is often preferred for high-throughput projects requiring high accuracy such as transcriptomics, exome sequencing, and large-scale population studies. |
Examples | PacBio sequencing, Nanopore sequencing | Illumina, Ion Torrent sequencing |
References