Hi-C sequencing is a next-generation sequencing (NGS) method used to study the three-dimensional (3D) structure of genomes by studying the arrangement and interactions of chromatin within the nucleus.
This is useful in understanding how DNA is organized inside the nucleus and how it works. This method was first introduced by Erez Lieberman Aiden in 2009. It combines High-Throughput Sequencing (HTS) and Chromosome Conformation Capture (3C) methods. This method provides detailed information about the interactions between different regions of the genome and it is widely used in studying gene expression, gene regulation, and chromatin folding.
The genome of most organisms is very long when fully stretched out but must be compacted to fit into a tiny nucleus or cellular structure. This is possible as DNA inside the cell is associated with different proteins to form a complex called chromatin which is organized in a regulated 3D structure. It is important to understand how DNA is arranged and interacts inside the cell as it plays an important role in many biological processes like DNA replication, cell division, and transcriptional regulation.
Two main methods are used to study DNA organization: microscopy and molecular methods. Microscopes can show the shape and position of chromosomes in a single cell but they cannot identify specific DNA sequences. On the other hand, molecular methods like 3C and Hi-C sequencing provide more detailed information about how different parts of the genome interact based on their 3D organization.


Principle of Hi-C Sequencing
Hi-C sequencing works by studying how different regions of DNA interact physically. Since the genome is folded and compacted in the nucleus, distant DNA regions can come close together. Hi-C sequencing identifies these interacting regions by chemically linking them, cutting, and reconnecting the pieces. This creates hybrid DNA fragments which are then sequenced and analyzed to study the interactions.
The Hi-C sequencing process begins with DNA crosslinking and digestion with restriction enzymes. Then DNA fragments are blunt-end ligated and labeled with biotin. The fragments are ligated to form hybrid DNA fragments which are prepared for sequencing. The DNA is cut into smaller fragments and biotinylated fragments are enriched with streptavidin-coated magnetic beads. These fragments are amplified using PCR and sequenced to identify which regions of the genome were interacting.


Process of Hi-C Sequencing
- Crosslinking: The first step in Hi-C sequencing is treating the cells with formaldehyde to crosslink DNA regions that are physically close in 3D space. Formaldehyde preserves the interactions between different regions of the genome and maintains the natural 3D structure of chromatin.
- Cell Lysis and Digestion: After crosslinking, the cells are lysed to release the chromatin. Then the sample is washed to remove non-crosslinked proteins and cut into smaller pieces by using sequence-specific restriction enzymes. Restriction digestion with endonucleases cuts the DNA at specific places and generates sticky ends or overhangs on the DNA fragments.
- Biotin Labeling: The fragmented DNA ends are repaired. During this repair, biotin-labeled bases are added to the DNA fragments. The biotin marker helps capture and isolate the ligated DNA fragments in later steps.
- Proximity Ligation: Then, the labeled DNA fragments are joined to form hybrid DNA molecules that contain DNA sequences from different genomic regions. This helps us capture long-range interactions between distant parts of the genome.
- Reverse Crosslinking and Protein Degradation: After ligation, the formaldehyde cross-linking is reversed and the associated proteins are degraded, leaving only the DNA fragments.
- Library Preparation: The resulting ligated DNA is cut into smaller fragments for library preparation. Then, the biotin-labeled DNA fragments are isolated using magnetic beads coated with streptavidin. This enriches the fragments that are relevant for studying chromatin interactions. These fragments are then amplified and sequenced.
- Sequencing: The selected DNA fragments are loaded in NGS platforms like Illumina which can generate paired-end reads. Paired-end sequencing is used as it sequences both ends of each fragment. The purified interaction fragments are sequenced to identify and analyze the interacting genomic regions.
- Data Analysis: Hi-C data analysis involves two main parts: preprocessing and downstream analysis.
- Preprocessing involves converting raw sequencing data into a map that shows the interactions between different genomic regions. It begins with FASTQ files from sequencing. First, the reads are processed to remove low-quality reads and then mapped to a reference genome. Each read is aligned individually as they often originate from distant parts of the genome. The reads are then filtered to remove noise or signals caused by experimental errors. Then, the reads are binned or grouped into genomic bins. Finally, the binned data is normalized to correct biases like uneven coverage.
- After preprocessing, downstream analysis is done to extract meaningful biological insights. It includes compartment identification, TAD calling, and loop calling. This is used to identify chromatin features such as active (A compartment) and inactive (B compartment) genomic regions based on interaction patterns, topologically associated domains (TADs), and chromatin loops. This analysis also helps detect structural variations like genome rearrangements and gives a detailed view of genome organization.
Chromosome Conformation Capture (3C)
- 3C is a genomic proximity ligation method that is used to study the interactions between any two loci using PCR-based methods.
- 3C involves cross-linking DNA and proteins to lock the structure in place, cutting the DNA with restriction enzymes, and re-ligating the fragments. The ligated products are detected by PCR or sequencing.
- However, 3C is not a large-scale method and has low throughput as it is limited to studying interactions between two predefined loci at a time.
- Improved versions of 3C were developed to address this limitation. Different versions of 3C like 4C, 5C, and Hi-C allow the study of DNA interactions at different scales, from single gene pairs to the entire genome. All 3C methods use the same front steps to capture chromatin interactions but the detection method is different in each method.
- Circular Chromosome Conformation Capture (4C) is a variation of 3C that identifies genome-wide interactions of a specific region of interest. In this method, the ligation products undergo a secondary restriction digest to fragment DNA further and form circularized fragments. Then, inverse PCR is performed and sequencing is done to identify the DNA interactions.
- Chromosome Conformation Capture Carbon Copy (5C) is another variation of 3C that is used to study the interactions between multiple genomic regions. It has higher throughput than 3C and 4C as it uses multiplexed ligation-mediated amplification.
- High-throughput Chromosome Conformation Capture (Hi-C) is a genome-wide variation of the 3C method. It has high resolution and provides information about all genomic interactions across the genome. In this method, fragments are biotin-labeled before sequencing.
Advantages of Hi-C Sequencing
- Hi-C sequencing provides high efficiency and can accurately identify the organization of genomic sequences. This provides detailed information about chromatin interactions and the structural arrangement of the genome.
- It has high resolution and can detect both long-range and short-range chromatin interactions.
- Hi-C can detect interactions between distant DNA regions that may not be directly next to each other in the linear sequence.
- It can detect complex structural features like TADs and chromatin loops which are important for understanding gene expression and overall genome function.
Limitations of Hi-C Sequencing
- Hi-C requires a large amount of starting material due to the multiple steps involved including crosslinking, digestion, and ligation.
- Hi-C data analysis is also complex and challenging as it involves multiple steps like read alignment, pairing, and normalization.
- Hi-C works on fixed cells so it cannot show how chromatin changes over time or varies between individual cells. Single-cell Hi-C methods have been developed but the data is incomplete making it hard to detect rare interactions.
- Hi-C captures only pairwise interactions because it depends on DNA ligation. It can only capture interactions between two DNA regions at a time making it difficult to study complex chromatin structures.
- Hi-C data often contains biases from sequencing and experimental procedures.
- In theory, Hi-C offers high resolution but the large number of possible interactions requires high sequencing depth which is impractical. This can reduce the resolution in practice.
Applications of Hi-C Sequencing
- Hi-C sequencing is useful in understanding the 3D structure of genomes in different organisms. This helps to understand how DNA is organized in the nucleus.
- Hi-C can be used to compare the 3D genome structures of different samples. This comparison can provide information about how genome structure changes in response to different biological states and provides evolutionary information.
- It is used to identify and study A/B genomic compartments. Compartment A is usually active and associated with gene expression, while compartment B is often inactive and associated with silencing.
- It can be used to study how genes interact with repetitive sequences in the genome. This is useful in understanding how repetitive DNA elements affect gene expression.
- Hi-C sequencing can be used to study chromatin changes associated with diseases. This helps in studying disease mechanisms and identifying potential therapeutic targets.
- It is used to identify the hierarchical arrangement of chromatin, from large compartments (A/B compartments) to smaller units like topologically associated domains (TADs) and loops. This is important for studying how different parts of the genome interact with each other which plays a role in gene regulation and expression.
References
- 3C sequencing (Hi-C sequencing) – CD Genomics. (n.d.). Retrieved from https://rna.cd-genomics.com/chromosome-conformation-capture-sequencing-3c-seq.html
- Barutcu, A. R., Fritz, A. J., Zaidi, S. K., van Wijnen, A. J., Lian, J. B., Stein, J. L., Nickerson, J. A., Imbalzano, A. N., & Stein, G. S. (2016). C-ing the Genome: A Compendium of Chromosome Conformation Capture Methods to Study Higher-Order Chromatin Organization. Journal of cellular physiology, 231(1), 31–35. https://doi.org/10.1002/jcp.25062
- Belton, J., McCord, R. P., Gibcus, J. H., Naumova, N., Zhan, Y., & Dekker, J. (2012). Hi–C: A comprehensive technique to capture the conformation of genomes. Methods, 58(3), 268–276. https://doi.org/10.1016/j.ymeth.2012.05.001
- Hi-C Sequencing Data Analysis: Introduction, methods, and Protocol – CD Genomics. (n.d.). Retrieved from https://bioinfo.cd-genomics.com/hi-c-sequencing-data-analysis-introduction-methods-and-protocol.html
- Hi-C/3C-Seq/Capture-C. (n.d.). Retrieved from https://www.illumina.com/science/sequencing-method-explorer/kits-and-arrays/hi-c-3c-seq-capture-c.html
- Lieberman-Aiden, E., Van Berkum, N. L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369
- Overview of Hi-C sequencing – CD genomics. (n.d.). Retrieved from https://www.cd-genomics.com/resource-hic-sequencing.html
- Pal, K., Forcato, M., & Ferrari, F. (2018). Hi-C analysis: from data generation to integration. Biophysical Reviews, 11(1), 67–78. https://doi.org/10.1007/s12551-018-0489-1
- Smith, C. (September 8, 2022). Chromatin analysis methods. Retrieved from https://www.biocompare.com/Editorial-Articles/589725-Chromatin-Analysis-Methods/
- Van Berkum, N. L., Lieberman-Aiden, E., Williams, L., Imakaev, M., Gnirke, A., Mirny, L. A., Dekker, J., Lander, E. S. (2010). Hi-C: a method to study the three-dimensional architecture of genomes. Journal of Visualized Experiments, (39), 1869. https://doi.org/10.3791/1869
- Erez Lieberman-Aiden et al. ,Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome.Science326,289-293(2009).DOI:10.1126/science.1181369.