Massively Parallel Sequencing (MPS): Principle, Steps, Uses microbiologystudy

Massively Parallel Sequencing (MPS), also called next-generation sequencing (NGS) or high-throughput sequencing (HTS), is a modern sequencing technology that can simultaneously sequence millions of short DNA or RNA fragments.

Unlike traditional sequencing methods which read one fragment at a time, MPS can sequence millions of fragments in parallel, producing vast amounts of data in a short time. This has reduced the cost and time while increasing the sequencing efficiency. It was introduced in early 2000s and has since advanced genomic research with faster and more accurate results compared to traditional methods like Sanger sequencing. It is widely used in various applications including genomics, transcriptomics, and clinical diagnostics.

Principle of Massively Parallel Sequencing

The main principle of MPS involves sequencing millions of short DNA or RNA fragments simultaneously, generating high-throughput data in a single run. Different MPS platforms use different methods to detect nucleotides such as sequencing-by-synthesis, ligation-based sequencing, or nanopore-based sequencing which allows the parallel sequencing of multiple fragments in a single run.

Massively Parallel SequencingMassively Parallel Sequencing
Massively Parallel Sequencing

The process begins with fragmenting the genetic material and attaching adapters to facilitate amplification and sequencing. Amplification methods such as bridge amplification (Illumina) or emulsion PCR are used to create multiple copies of each fragment. However, newer MPS platforms eliminate the need for this amplification step. Different platforms use different sequencing techniques. Sequencing-by-synthesis involves incorporating fluorescently labeled nucleotides which are detected by imaging systems. Ligation-based sequencing uses fluorescent di-base probes. Nanopore sequencing measures ionic changes produced as DNA strands pass through biological nanopores. In other platforms, such as ion-based sequencing, changes in pH are detected as nucleotides are incorporated. 

Development of Massively Parallel Sequencing 

DNA sequencing was first introduced in 1977 with the development of Sanger sequencing. It is known as the first-generation sequencing technology. This method dominated the field of sequencing for decades but it was slow, costly, and relied on manual cloning which made it unsuitable for large-scale projects like whole-genome sequencing. In the 1980s and 1990s, automated sequencing using capillary electrophoresis (CE) improved throughput and reduced costs but sequencing remained costly for large-scale projects.

Massively parallel sequencing emerged in the early 2000s with second-generation sequencing methods. This marked a shift in sequencing technology and paved the way for modern sequencing methods. Early second-generation sequencing platforms like Roche 454 and Illumina’s Solexa system laid the groundwork and introduced new methods that allowed for faster and more accurate sequencing.

Third-generation sequencing technologies, such as PacBio and Oxford Nanopore, have further advanced the field by providing longer read lengths, faster run times, and real-time sequencing capabilities. These systems allow direct sequencing of long DNA fragments, eliminating the amplification steps.

Process/Steps of Massively Parallel Sequencing

  1. DNA/RNA Extraction and Library Preparation: DNA or RNA of interest is extracted and fragmented into smaller pieces. Then, adapters are attached to the ends of the fragments. These fragments are enriched and amplified to create a sequencing library. Depending on the platform, amplification methods such as PCR or bridge amplification are used to produce multiple copies of each fragment. Some newer sequencing platforms skip this amplification step.
  2. Sequencing: The prepared library is placed on the sequencing platform. Different platforms use different sequencing technologies such as sequencing-by-synthesis, sequencing-by-ligation, or ion-sensitive sequencing. Signals such as light or pH changes corresponding to each nucleotide are detected which generates raw data in the form of signal intensities or voltage changes.
  3. Data Analysis: These signals are processed and converted into nucleotide sequences using base calling. The next step is alignment and variant calling. This includes aligning the sequences to a reference genome and identifying genetic variations. Then, the identified variants are annotated and analyzed for interpretation.
Steps of Massively Parallel SequencingSteps of Massively Parallel Sequencing
Steps of Massively Parallel Sequencing

Massively Parallel Sequencing Platforms 

First-generation MPS platforms or second-generation sequencing methods rely on amplifying individual DNA templates into multiple copies before sequencing using one of the three methods: emulsion PCR, bridge amplification, or DNA nanoballs. These technologies formed the basis of early high-throughput sequencing and were adopted by various platforms such as Ion Torrent, Illumina, and BGI sequencing systems.

Second-generation MPS platforms or third-generation sequencing methods directly sequence single DNA molecules without amplification, allowing long-read sequencing. These methods can produce significantly longer sequence reads but with lower output. 

Some of the commercially established massively parallel sequencing platforms are briefly described below: 

Illumina Sequencing

It is one of the most widely used sequencing technologies. It uses sequencing-by-synthesis (SBS) to detect nucleotide bases. In this platform, DNA is fragmented, adapters are added, and clusters are generated on a flow cell using bridge amplification. Sequencing involves adding fluorescently labeled nucleotides and after each base is added, the surface is imaged to identify the sequence. Its high accuracy and scalability make it the dominant platform for research and clinical applications. However, it faces challenges including short read lengths, high reagent costs and sample requirements.

Illumina SequencingIllumina Sequencing
Illumina Sequencing

SOLiD (Sequencing by Oligonucleotide Ligation and Detection)

This platform uses a unique ligation-based method to sequence DNA, using fluorescently labeled di-base probes and a color-space coding system. Each base is analyzed twice for error correction. The DNA is amplified on beads using emulsion PCR and then sequenced by multiple cycles of ligation and detection. It can accurately detect single-nucleotide polymorphisms and genetic variations but is limited by short read lengths and complex workflow.

Steps of SOLiD SequencingSteps of SOLiD Sequencing
Steps of SOLiD Sequencing

Ion Torrent Sequencing

This platform originated from the concepts developed for 454 sequencing which is now discontinued. It also uses emulsion PCR to amplify DNA fragments on beads. These beads are placed into wells on an electronic chip. Nucleotides are added sequentially to the DNA template and their incorporation is detected by pH changes caused by the release of hydrogen ions. This method is fast, sensitive, and suitable for small-scale sequencing. However, it struggles with homopolymer accuracy and lower output limiting its use for whole genome sequencing (WGS).

Ion Torrent SequencingIon Torrent Sequencing
Ion Torrent Sequencing

BGI Sequencing

BGI uses DNA nanoballs. DNA is cloned into adenoviral vectors, amplified using rolling circle amplification, and compressed into nanoballs. These nanoballs are placed on patterned flow cells for sequencing via combinatorial probe-anchor synthesis (cPAS). Fluorescently labeled probes detect nucleotide incorporation. This platform offers low-cost sequencing services. BGI traditionally operated onsite in China only due to the restrictions of instrument availability outside of China but it is now expanding internationally.

Pacific Biosciences (PacBio) Sequencing

It uses single molecule real-time (SMRT) sequencing which detects nucleotide incorporation in real time and can generate long reads. This method uses SMRT cells that contain small wells called zero-mode waveguides (ZMWs) where a DNA polymerase synthesizes DNA. Isolated DNA ligated with SMRT bell adapters are placed at the bottom of the ZMWs. Fluorescently labeled nucleotides emit signals and indicate base incorporation during sequencing. It can detect modified bases and sequence RNA directly. However, it has lower throughput compared to Illumina or other short-read platforms.

PacBio SequencingPacBio Sequencing
PacBio Sequencing

Oxford Nanopore Sequencing

Like PacBio, this is a long-read sequencing platform capable of generating long reads in real time. This sequencing platform uses biological nanopores to detect ionic current changes as nucleic acids pass through these pores. Raw data from these changes are converted into base calls and used to determine the sequence of nucleotides. It is a portable and versatile sequencing technology suitable for different applications ranging from small field studies to large-scale sequencing projects. However, it also faces challenges with accuracy.

Oxford Nanopore Sequencing StepsOxford Nanopore Sequencing Steps
Oxford Nanopore Sequencing Steps

    Advantages of Massively Parallel Sequencing

    • Massively parallel sequencing can process millions of sequencing reactions simultaneously, increasing throughput and reducing costs compared to traditional Sanger sequencing which handles fewer reactions. This allows large-scale projects such as sequencing entire genomes to be completed more quickly and affordably.
    • MPS automates DNA amplification and sequencing within the same instrument. This removes the need for manual cloning, streamlining the process.
    • It can accurately detect rare genetic variants and minor alleles. 
    • It has a high sensitivity for detecting single nucleotide changes, copy number variations, and structural variants. This ensures accurate variant identification. 
    • It can process samples from different patients in a single run which lowers costs and improves efficiency.
    • MPS can extract genetic information from minimal or highly degraded DNA samples which are often challenging to handle using traditional methods.

    Limitations of Massively Parallel Sequencing

    • MPS generates a high volume of data which needs advanced bioinformatics tools and computational resources for storage, processing, and analysis. 
    • MPS involves processing a large number of genes. This increases the likelihood of incidental findings which may raise ethical issues regarding privacy of genetic data and its potential misuse.
    • Some genomic regions may lack sufficient sequencing reads leading to missed variants in these areas. 
    • It has high initial costs. While costs have decreased over time, whole-genome sequencing remains expensive.
    • The raw sequence data from MPS generally have higher error rates. 
    • Many MPS platforms produce shorter reads than Sanger sequencing which complicates de novo sequence assembly and interpretation in complex and repetitive DNA regions. 

    Applications of Massively Parallel Sequencing (MPS)

    • MPS can be used in whole-genome sequencing (WGS) or comprehensive sequencing of entire genomes which covers all exons, introns, and regulatory sequences. This provides information about both coding and non-coding regions and has applications in identifying genetic disorders and mutations linked to diseases.
    • It can be used in gene panel testing which involves sequencing specific genes associated with certain diseases or conditions. 
    • It can be used to identify pharmacogenetic variants in a single assay which is useful in personalized drug treatment. This helps in identifying drug reactions which can help prevent death and illness. 
    • It can be used in cancer diagnostics and treatment. It can detect tumor heterogeneity and rare mutations in small cell fractions.
    • It also has applications in studying epigenetics. It can be used in genome-wide testing for methylation patterns and histone modifications, helping in the diagnosis of cancers and other genetic disorders caused by epigenetic abnormalities. It can be used in methylation sequencing to identify methylation changes and understand regulatory roles in diseases.
    • MPS can also be used in microbiome sequencing and metagenomics to study different microbial and viral species from samples. 
    • It can be used in forensic science to solve cold cases or analyze decades-old evidence where DNA quality is degraded.
    • MPS also has applications in transcriptome sequencing (RNA-seq). This is useful in identifying transcribed genes and detecting expressed gene fusions.

    References

    1. Bruijns, B., Tiggelaar, R., & Gardeniers, H. (2018). Massively parallel sequencing techniques for forensics: A review. Electrophoresis, 39(21), 2642–2654. https://doi.org/10.1002/elps.201800082
    2. Campen, J. V., Frost, A. (n.d.). Massively parallel sequencing. Retrieved from https://www.genomicseducation.hee.nhs.uk/genotes/knowledge-hub/massively-parallel-sequencing/
    3. Gao, G., & Smith, D. I. (2019). Clinical massively parallel sequencing. Clinical Chemistry, 66(1), 77–88. https://doi.org/10.1373/clinchem.2019.303305
    4. Massively Parallel Signature Sequencing (MPSS). (n.d.). Retrieved from https://www.ncbi.nlm.nih.gov/probe/docs/techmpss/
    5. Moorthie, S., Mattocks, C. J., & Wright, C. F. (2011). Review of massively parallel DNA sequencing technologies. The HUGO Journal, 5(1–4), 1–12. https://doi.org/10.1007/s11568-011-9156-3
    6. MPS vs. NGS: What’s the Difference? (2017, June 21). Retrieved from https://inside.battelle.org/blog-details/mps-vs.-ngs-what’s-the-difference
    7. Tucker, T., Marra, M., & Friedman, J. M. (2009). Massively parallel sequencing: the next big thing in genetic medicine. The American Journal of Human Genetics, 85(2), 142–154. https://doi.org/10.1016/j.ajhg.2009.06.022

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top