Metagenomics is the study of all genomes of organisms isolated from a bulk sample. It is particularly useful for studying communities of microorganisms in various environments, such as human skin or water samples.
Understanding microbial interactions in various environments is a significant challenge because most microbes cannot be cultured using standard laboratory techniques. Metagenomics addresses the limitations of traditional microbiological techniques. It is culture-independent and is used to study the vast majority of microbes that are unculturable or difficult to culture.
Metagenomics uses several genomic technologies and tools to study the genomes of microbial communities in their natural environments and within their complex communities. It has contributed to our understanding of microbial ecology and diversity. Metagenomics focuses on community genes and their interactions rather than individual organisms.
Interesting Science Videos
Historical Development of Metagenomics
- The development of molecular techniques such as the use of 16S ribosomal RNA (rRNA) sequencing allowed the study of microbial communities without the need for cultivation. This built the foundation for metagenomics. The use of rRNA as molecular marker was first proposed by Carl Woese in the late 1970s.
- In 1977, the development of the Sanger sequencing method by Frederick Sanger and colleagues further helped in the study of microorganisms.
- Jo Handelsman and colleagues coined the term “metagenomics” in 1998 and defined it as the study of the collective genomes of microorganisms in environmental samples.
- The early 2000s saw the development of high-throughput sequencing technologies which allowed the rapid sequencing of entire microbial communities.
- In the mid-2000s, different specialized databases and bioinformatics tools were developed to manage the growing volume of metagenomic data. Projects like CAMERA (Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis) and IMG/M (Integrated Microbial Genomes with Microbiomes) provided platforms for storing and sharing metagenomic data.
- At present, metagenomics continues to advance along with other omics technologies. The field of metagenomics is rapidly advancing with improvements in sequencing technologies and bioinformatics tools.
Principle of Metagenomics
The principle of metagenomics involves studying the genome of microbial communities in bulk environmental samples directly without culturing individual species. This allows detailed analysis of microbial communities and overcomes the limitations of traditional laboratory culture methods. The process in metagenomics includes collecting environmental samples, extracting DNA from the microorganisms present in the sample, sequencing using high-throughput technologies, and analyzing the sequencing data with bioinformatics tools to identify and characterize the microbial communities in their natural habitats.
Metagenomic studies can be performed using two main methods. The first is targeted sequencing which is based on the principle that targeting specific regions of an organism’s genome can be used to identify and characterize the organisms. The second method is shotgun sequencing which is based on the principle of randomly sequencing all the DNA in a sample. This provides a detailed analysis of the genetic material from all organisms in the sample.
Types of Metagenomics
The two main types of metagenomics methods are targeted metagenomics and shotgun metagenomics.
- Targeted metagenomics, also known as amplicon-based sequencing, involves sequencing specific genetic markers or conserved regions within microbial communities, such as 16S rRNA for bacteria and 18S rRNA or internal transcribed spacer (ITS) regions for fungi. These regions have variable sequences that are useful for the identification of different organisms present in various environmental samples. However, identifying organisms at the species level is challenging with this method. This method generally does not provide strain-level resolution.
- Shotgun metagenomics or whole-genome shotgun (WGS) metagenomics involves random sequencing of all genetic material in a sample. Unlike targeted sequencing methods that focus on specific genetic markers, shotgun metagenomics sequences all the DNA present in a sample. This provides detailed information on both the taxonomic and functional composition of the microbial community. It also allows species-level identification.
Steps Involved in Metagenomic Analysis
The metagenomic analysis involves the following 9 steps.
1. Sample Collection and Processing
- Proper sample collection, preservation, and processing are important to maintain the integrity of the genetic material and ensure the accuracy of metagenomic analyses.
- The process starts with the collection of environmental or biological samples. Different protocols are used for different sample types with the same purpose of maximizing DNA yield while maintaining large DNA fragments.
- Once collected, the samples undergo processing to extract DNA. Different physical and chemical methods are used for DNA extraction, depending on the sample type and the desired quality and quantity of DNA.
- The cells are lysed to release the DNA and the cellular debris is removed using centrifugation or filtration. The DNA is purified using methods like phenol-chloroform extraction and silica column-based kits. The extracted DNA is then stored at appropriate temperatures.
2. Construction of Metagenomic Libraries
- After the extraction of microbial DNA, metagenomic libraries are created by cloning and transforming them into suitable host cells.
- At first, the microbial DNA is cut or digested into specific sizes. These DNA fragments undergo recombination with a suitable cloning vector. Some common vectors used for cloning include plasmids, cosmids, bacterial artificial chromosomes (BAC), and phage.
- The recombinant DNA or vector is then transformed into suitable host cells. Escherichia coli is the most commonly used host as it is easy to culture.
3. Screening of Metagenome Library
- Metagenomic libraries are screened to identify and isolate specific genes. Some of the screening methods include functional screening, sequence analysis, substrate-induced gene expression screening (SIGEX), DNA microarray, and fluorescence in situ hybridization (FISH).
- Functional screening identifies clones that express active products of genes based on visible changes or growth under selective media. This method is fast and simple. However, it has a low success rate and requires the appropriate hosts.
- Sequence-based screening uses PCR and gene hybridization to find target genes. This method does not depend on host cell expression and is effective for conserved sequence genes. However, it is limited to known sequences and cannot detect unknown genes.
- SIGEX identifies genes in a metagenomic library that become active in the presence of specific substrates. It can be used to derive the functions of unknown genes using the substrate.
4. DNA Sequencing
- The next step in the workflow is DNA sequencing. Sequencing technology has evolved from Sanger sequencing to next-generation sequencing (NGS) methods.
- NGS technologies like 454/Roche and Illumina/Solexa have become widely used in metagenomics due to their cost-effectiveness and high throughput.
- Other emerging sequencing technologies include Applied Biosystems SOLiD, Ion Torrent, Pacific Biosciences (PacBio), and Complete Genomics.
5. Sequence Assembly
- Assembling metagenomic data involves reconstructing genomes from short sequencing reads. This process involves piecing together short DNA read fragments generated during sequencing to construct longer genomic contigs. Longer sequences provide more information and better accuracy.
- There are two methods for metagenomic assembly: reference-based and de novo assembly.
- Reference-based assembly uses reference genomes and is suitable when the metagenomic dataset contains sequences with close matches to reference genomes.
- De novo assembly constructs contigs without relying on reference genomes. It requires more computational resources compared to reference-based assembly and may take longer to run.
6. Binning
- Binning is the process of grouping or sorting assembled DNA sequences into groups or bins that represent individual genomes from closely related organisms in a microbial community.
- Various algorithms have been developed for binning that use different types of information contained within DNA sequences.
- Compositional-based binning algorithms use the conserved nucleotide composition of genomes. They analyze sequence fragments for patterns in nucleotide composition such as GC content to group them into bins.
- Similarity-based binning algorithms rely on the similarity of DNA fragments to known genes in reference databases. These algorithms compare query sequences to sequences in reference databases and assign them to taxonomic groups based on the similarity scores.
- Some binning algorithms combine both compositional and similarity-based methods. They combine information on nucleotide composition with sequence similarity to improve binning accuracy.
7. Annotation
- The genes within the assembled sequences are identified and annotated to predict their functions using bioinformatics tools and databases.
- Annotation involves two main steps: feature prediction and functional annotation.
- Feature prediction identifies genes or other genomic elements within the DNA sequences. It also identifies non-coding RNA, regulatory sequences, binding sites, structural motifs, and other biologically relevant elements.
- Functional annotation identifies functions of the predicted genes. Metagenomic annotation relies on homology searches against reference databases.
8. Experimental Design and Statistical Analysis
- Experimental design and statistical analysis are important in metagenomic studies to ensure valid and interpretable results.
- Effective statistical methods interpret the vast amounts of metagenomic data generated and handle the complexity of the data.
- Various statistical tools are used to analyze metagenomic data. The Primer-E package and web-based tools like Metastats allow multivariate statistical analyses.
9. Data Storage and Sharing
- The vast amount of metagenomic data requires effective data storage and sharing. Metagenomic data can be stored in standardized formats in repositories, such as the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA).
- Metagenomics involves more complex and voluminous datasets. Storing this vast amount of sequence data in existing sequence databases is a significant challenge. Effective use of this data requires specialized secondary databases.
- To facilitate this, different projects like IMG/M, CAMERA, and MG-RAST have been developed. These platforms allow the sharing of both raw data and computational results.
Importance of Metagenomics
- The traditional pure culture method in microbiology isolates individual species and studies their responses to specific chemicals in controlled environments. This method limits our understanding of microbial behavior within complex communities.
- Metagenomics addresses these limitations of traditional microbiology by allowing the study of microbial communities directly in their natural habitats. This provides information about the ecological roles and interactions of microbes in these communities.
- Metagenomics allows the study of genetic material from entire microbial communities without the need for culturing individual species.
- Metagenomics provides information about the composition, function, and interactions within microbial communities.
Applications of Metagenomics
- Metagenomics is used in the study of the human microbiome. This includes research on gut microbiota, skin microbiota, and their roles in health and disease.
- Metagenomics can be used to identify pathogens in clinical samples which helps to diagnose and treat infectious diseases.
- Metagenomics helps in understanding the microbial diversity and functions in soil.
- Metagenomics can be used to monitor water quality and detect pollution by studying microbial communities in water samples.
- Metagenomics can also be used in bioremediation efforts by identifying microorganisms that are capable of degrading pollutants.
- Understanding the microbial communities associated with plants can improve crop health and productivity.
- Metagenomics helps to identify microorganisms that are capable of energy production which is useful for the production of biofuels and other renewable energy sources.
- Metagenomics is also useful in discovering new enzymes, bioactive compounds, antibiotics, pharmaceuticals, and other beneficial microbial products.
Limitations of Metagenomics
- Cloning biases can lead to the misrepresentation of microbial DNA in metagenomic libraries.
- Sampling biases can occur when collecting environmental samples which can also lead to misrepresentation of the microbial community.
- Metagenomic sequencing can detect the presence of microorganisms but cannot easily determine if they are pathogenic.
- The vast amount of metagenomic data requires faster and more scalable computational tools to handle the data efficiently.
- Metagenomic datasets contain a mix of DNA from various organisms which often makes it difficult to assemble genomes accurately.
- Contamination from host DNA or environmental sources can affect the accuracy of metagenomic analyses.
- It is also challenging to extract genes from microorganisms present in low abundance within a sample.
Examples of Metagenomics Projects
Human Microbiome Project (HMP)
- HMP is one of the important examples of metagenomics projects. It was launched by the National Institutes of Health (NIH) in 2007 to study the microbial communities present in different parts of the human body, such as the gut, skin, and mouth.
- The project aimed at understanding the roles of microbiome in human health and diseases.
- HMP used metagenomic techniques to sequence and analyze the DNA of microbial communities directly from human samples.
- This project has provided data on the diversity and function of human-associated microbes.
Global Ocean Sampling Expedition (GOS)
- The GOS expedition is another example of a metagenomics project that was led by J. Craig Venter and his team in 2004.
- This project aimed to study the microbial diversity present in the oceans using shotgun metagenomics sequencing techniques.
- During the expedition, the Sorcerer II vessel traveled across the globe for over two years. The team collected water samples from various marine environments.
- The GOS Expedition expanded our understanding of marine microbial diversity and revealed many previously unknown microorganisms and genes.
Integrated Microbial Genomes and Microbiomes (IMG/M) System
- IMG/M is a database platform that focuses on microbial genomes and microbiomes. It provides access to metagenomic data and also provides tools for comparative analysis of microbial genomes.
- It is also useful for annotating the function of microbial communities from the metagenomic data.
References
- 16S and ITS rRNA Sequencing | Identify bacteria & fungi with NGS (illumina.com)
- Dudhagara, P., Bhavsar, S., Bhagat, C., Ghelani, A., Bhatt, S., & Patel, R. (2015). Web Resources for Metagenomics Studies. Genomics, proteomics & bioinformatics, 13(5), 296–303. https://doi.org/10.1016/j.gpb.2015.10.003
- Escobar-Zepeda, A., Vera-Ponce de León, A., & Sanchez-Flores, A. (2015). The Road to Metagenomics: From Microbiology to DNA Sequencing Technologies and Bioinformatics. Frontiers in genetics, 6, 348. https://doi.org/10.3389/fgene.2015.00348
- Global Ocean Sampling Expedition (GOS) | J. Craig Venter Institute (jcvi.org)
- Metagenomics – NGS Analysis (nyu.edu)
- Metagenomics (genome.gov)
- National Research Council (US) Committee on Metagenomics: Challenges and Functional Applications. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington (DC): National Academies Press (US); 2007. 1, Why Metagenomics? Available from: https://www.ncbi.nlm.nih.gov/books/NBK54011/
- NIH Human Microbiome Project – Home (hmpdacc.org)
- Pérez-Cobas, A. E., Gomez-Valero, L., & Buchrieser, C. (2020). Metagenomic approaches in microbial ecology: an update on whole-genome and marker gene sequencing analyses. Microbial genomics, 6(8), mgen000409. https://doi.org/10.1099/mgen.0.000409
- Shotgun Metagenomic Sequencing (illumina.com)
- Thomas, T., Gilbert, J., & Meyer, F. (2012). Metagenomics – a guide from sampling to data analysis. Microbial informatics and experimentation, 2(1), 3. https://doi.org/10.1186/2042-5783-2-3
- What Is Metagenomics? (icliniq.com)
- What is Metagenomics?- Definition, Steps, Process and Applications (geneticeducation.co.in)
- Zhang, L., Chen, F., Zeng, Z., Xu, M., Sun, F., Yang, L., Bi, X., Lin, Y., Gao, Y., Hao, H., Yi, W., Li, M., & Xie, Y. (2021). Advances in Metagenomics and Its Application in Environmental Microorganisms. Frontiers in microbiology, 12, 766364. https://doi.org/10.3389/fmicb.2021.766364