1 Basics

1.1 Required knowledge

HiCParser is based on other packages and in particular in those that have implemented the infrastructure needed for dealing with HiC data with several replicates and conditions. Is provides several parsers, for several HiC data standard format to import them into R in a InteractionSet object.

1.2 Citing HiCParser

We hope that HiCParser will be useful for your research. Please use the following information to cite the package and the overall approach. Thank you!

## Citation info
citation("HiCParser")
#> To cite package 'HiCParser' in publications use:
#> 
#>   Maigne E, Zytnicki M (2025). _HiCParser package to parse HiC data and
#>   import them in R_. doi:10.18129/B9.bioc.HiCParser
#>   <https://doi.org/10.18129/B9.bioc.HiCParser>,
#>   https://github.com/emaigne/HiCParser/HiCParser - R package version
#>   0.99.0, <http://www.bioconductor.org/packages/HiCParser>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {{HiCParser} package to parse HiC data and import them in R},
#>     author = {Elise Maigne and Matthias Zytnicki},
#>     year = {2025},
#>     url = {http://www.bioconductor.org/packages/HiCParser},
#>     note = {https://github.com/emaigne/HiCParser/HiCParser - R package version 0.99.0},
#>     doi = {10.18129/B9.bioc.HiCParser},
#>   }

2 Start using HiCParser

library("HiCParser")

HiCParser can import Hi-C data sets in various different formats: - Cooler .cool or .mcool files. - Juicer .hic files. - HiC-Pro .matrix and .bed files. - Tabular (.tsv, .csv, …) files.

2.1 Cooler files

2.1.1 .cool files

To load .cool files generated by [Cooler][cooler-documentation] [@cooler]:

# Path to each file
paths <- c(
    "path/to/condition-1.replicate-1.cool",
    "path/to/condition-1.replicate-2.cool",
    "path/to/condition-1.replicate-3.cool",
    "path/to/condition-2.replicate-1.cool",
    "path/to/condition-2.replicate-2.cool",
    "path/to/condition-2.replicate-3.cool"
)

# For the sake of the example, we will use the same file, several times
paths <- rep(
    system.file("extdata",
        "hicsample_21.cool",
        package = "HiCParser"
    ),
    6
)

# Condition and replicate of each file. Can be names instead of numbers.
conditions <- c(1, 1, 1, 2, 2, 2)
replicates <- c(1, 2, 3, 1, 2, 3)

# Instantiation of data set
hic.experiment <- parseCool(
    paths,
    conditions = conditions,
    replicates = replicates
)
#> Loading required namespace: rhdf5
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.cool'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.cool'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.cool'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.cool'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.cool'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.cool'.

2.1.2 .mcool files

To load .mcool files generated by [Cooler][cooler-documentation] [@cooler]:

# Path to each file
paths <- c(
    "path/to/condition-1.replicate-1.mcool",
    "path/to/condition-1.replicate-2.mcool",
    "path/to/condition-1.replicate-3.mcool",
    "path/to/condition-2.replicate-1.mcool",
    "path/to/condition-2.replicate-2.mcool",
    "path/to/condition-2.replicate-3.mcool"
)

# For the sake of the example, we will use the same file, several times
paths <- rep(
    system.file("extdata",
        "hicsample_21.mcool",
        package = "HiCParser"
    ),
    6
)

# Condition and replicate of each file. Can be names instead of numbers.
conditions <- c(1, 1, 1, 2, 2, 2)
replicates <- c(1, 2, 3, 1, 2, 3)

# mcool files can store several resolutions.
# We will mention the one we need.
binSize <- 5000000

# Instantiation of data set
# The same function "parseCool" is used for cool and mcool files
hic.experiment <- parseCool(
    paths,
    conditions = conditions,
    replicates = replicates,
    binSize = binSize # Specified for .mcool files.
)
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.mcool'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.mcool'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.mcool'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.mcool'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.mcool'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.mcool'.

2.2 hic files

To load .hic files generated by [Juicer][juicer-documentation] [@juicer]:

# Path to each file
paths <- c(
    "path/to/condition-1.replicate-1.hic",
    "path/to/condition-1.replicate-2.hic",
    "path/to/condition-2.replicate-1.hic",
    "path/to/condition-2.replicate-2.hic",
    "path/to/condition-3.replicate-1.hic"
)

# For the sake of the example, we will use the same file, several times
paths <- rep(
    system.file("extdata",
        "hicsample_21.hic",
        package = "HiCParser"
    ),
    6
)

# Condition and replicate of each file. Can be names instead of numbers.
conditions <- c(1, 1, 1, 2, 2, 2)
replicates <- c(1, 2, 3, 1, 2, 3)

# hic files can store several resolutions.
# We will mention the one we need.
binSize <- 5000000

# Instantiation of data set
hic.experiment <- parseHiC(
    paths,
    conditions = conditions,
    replicates = replicates,
    binSize = binSize
)
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.hic'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.hic'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.hic'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.hic'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.hic'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.hic'.

Currently, HiCParser supports the hic format up to the version 9.

2.3 HiC-Pro files

To load .matrix and .bed files generated by [HiC-Pro][hicpro-documentation] [@hicpro]:

# Path to each matrix file
matrixPaths <- c(
    "path/to/condition-1.replicate-1.matrix",
    "path/to/condition-1.replicate-2.matrix",
    "path/to/condition-1.replicate-3.matrix",
    "path/to/condition-2.replicate-1.matrix",
    "path/to/condition-2.replicate-2.matrix",
    "path/to/condition-2.replicate-3.matrix"
)

# For the sake of the example, we will use the same file, several times
matrixPaths <- rep(
    system.file("extdata",
        "hicsample_21.matrix",
        package = "HiCParser"
    ),
    6
)

# Path to each bed file
bedPaths <- c(
    "path/to/condition-1.replicate-1.bed",
    "path/to/condition-1.replicate-2.bed",
    "path/to/condition-1.replicate-3.bed",
    "path/to/condition-2.replicate-1.bed",
    "path/to/condition-2.replicate-2.bed",
    "path/to/condition-2.replicate-3.bed"
)

# Alternatively, if the same bed file is used, we can provide it only once
bedPaths <- system.file("extdata",
    "hicsample_21.bed",
    package = "HiCParser"
)

# Condition and replicate of each file. Can be names instead of numbers.
conditions <- c(1, 1, 1, 2, 2, 2)
replicates <- c(1, 2, 3, 1, 2, 3)

# Instantiation of data set
hic.experiment <- parseHiCPro(
    matrixPaths = matrixPaths,
    bedPaths = bedPaths,
    conditions = conditions,
    replicates = replicates
)
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.matrix' and '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.bed'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.matrix' and '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.bed'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.matrix' and '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.bed'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.matrix' and '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.bed'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.matrix' and '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.bed'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.matrix' and '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.bed'.

2.4 Tabular files

A tabular file is a tab-separated multi-replicate sparse matrix with a header:

chromosome    position 1    position 2    C1.R1    C1.R2    C1.R3    ...
Y             1500000       7500000       145      184      72       ...

The number of interactions between position 1 and position 2 of chromosome are reported in each condition.replicate column. There is no limit to the number of conditions and replicates.

To load Hi-C data in this format:

hic.experiment <- parseTabular(
    system.file("extdata",
        "hicsample_21.tsv",
        package = "HiCParser"
    ),
    sep = "\t"
)
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.tsv'.

3 InteractionSet format

4 Output : InteractionSet format

The output is a InteractionSet. This object can store one or several samples.

Please read the documentation associated with the InteractionSet package to known more about this format.

library("HiCParser")
hicFilePath <- system.file("extdata", "hicsample_21.hic", package = "HiCParser")
hic.experiment <- parseHiC(
    paths = rep(hicFilePath, 6),
    binSize = 5000000,
    conditions = rep(seq(2), each = 3),
    replicates = rep(seq(3), 2)
)
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.hic'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.hic'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.hic'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.hic'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.hic'.
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.hic'.
hic.experiment
#> class: InteractionSet 
#> dim: 44 6 
#> metadata(0):
#> assays(1): ''
#> rownames: NULL
#> rowData names(1): chromosome
#> colnames: NULL
#> colData names(2): condition replicate
#> type: StrictGInteractions
#> regions: 9

The conditions and replicates are reported in the colData slot :

SummarizedExperiment::colData(hic.experiment)
#> DataFrame with 6 rows and 2 columns
#>   condition replicate
#>   <integer> <integer>
#> 1         1         1
#> 2         1         2
#> 3         1         3
#> 4         2         1
#> 5         2         2
#> 6         2         3

They corresponds to columns of the assays matrix (containing interactions values):

head(SummarizedExperiment::assay(hic.experiment))
#>      [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,]   79   79   79   79   79   79
#> [2,]   22   22   22   22   22   22
#> [3,]    3    3    3    3    3    3
#> [4,]    1    1    1    1    1    1
#> [5,]    1    1    1    1    1    1
#> [6,]    2    2    2    2    2    2

The positions of interactions are in the interactions slot of the object:

InteractionSet::interactions(hic.experiment)
#> StrictGInteractions object with 44 interactions and 1 metadata column:
#>        seqnames1           ranges1     seqnames2           ranges2 | chromosome
#>            <Rle>         <IRanges>         <Rle>         <IRanges> |      <Rle>
#>    [1]        21  5000001-10000000 ---        21  5000001-10000000 |         21
#>    [2]        21  5000001-10000000 ---        21 10000001-15000000 |         21
#>    [3]        21  5000001-10000000 ---        21 15000001-20000000 |         21
#>    [4]        21  5000001-10000000 ---        21 20000001-25000000 |         21
#>    [5]        21  5000001-10000000 ---        21 25000001-30000000 |         21
#>    ...       ...               ... ...       ...               ... .        ...
#>   [40]        21 35000001-40000000 ---        21 40000001-45000000 |         21
#>   [41]        21 35000001-40000000 ---        21 45000001-50000000 |         21
#>   [42]        21 40000001-45000000 ---        21 40000001-45000000 |         21
#>   [43]        21 40000001-45000000 ---        21 45000001-50000000 |         21
#>   [44]        21 45000001-50000000 ---        21 45000001-50000000 |         21
#>   -------
#>   regions: 9 ranges and 1 metadata column
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths

4.1 Additional utils functions

A function mergeInteractionSet to merge InteractionSet objects, from the same experiment (for differents replicates or conditions).

It merges the the data containing bins of interactions and fill the assays matrix accordingly, returning an assays matrix with several columns.

The object returned by the function is an InteractionSet.

Here is a fictitious example:

path <- system.file("extdata", "hicsample_21.cool", package = "HiCParser")
object1 <- parseCool(path, conditions = 1, replicates = 1)
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.cool'.
# Creating an object with a different condition
object2 <- parseCool(path, conditions = 2, replicates = 1)
#> 
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.cool'.

The merged object:

objectMerged <- mergeInteractionSet(object1, object2)
SummarizedExperiment::colData(objectMerged)
#> DataFrame with 2 rows and 2 columns
#>   condition replicate
#>   <numeric> <numeric>
#> 1         1         1
#> 2         2         1
head(SummarizedExperiment::assay(objectMerged))
#>      [,1] [,2]
#> [1,]   79   79
#> [2,]   22   22
#> [3,]    3    3
#> [4,]    1    1
#> [5,]    1    1
#> [6,]    2    2

5 Reproducibility

This package was developed using biocthis.

R session information.

#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R Under development (unstable) (2025-01-20 r87609)
#>  os       Ubuntu 24.04.2 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  C
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2025-02-19
#>  pandoc   2.7.3 @ /usr/bin/ (via rmarkdown)
#>  quarto   1.5.57 @ /usr/local/bin/quarto
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  package              * version date (UTC) lib source
#>  abind                  1.4-8   2024-09-12 [2] CRAN (R 4.5.0)
#>  Biobase                2.67.0  2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#>  BiocGenerics           0.53.6  2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#>  BiocManager            1.30.25 2024-08-28 [2] CRAN (R 4.5.0)
#>  BiocStyle            * 2.35.0  2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#>  bookdown               0.42    2025-01-07 [2] CRAN (R 4.5.0)
#>  bslib                  0.9.0   2025-01-30 [2] CRAN (R 4.5.0)
#>  cachem                 1.1.0   2024-05-16 [2] CRAN (R 4.5.0)
#>  cli                    3.6.4   2025-02-13 [2] CRAN (R 4.5.0)
#>  crayon                 1.5.3   2024-06-20 [2] CRAN (R 4.5.0)
#>  data.table             1.16.4  2024-12-06 [2] CRAN (R 4.5.0)
#>  DelayedArray           0.33.6  2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#>  digest                 0.6.37  2024-08-19 [2] CRAN (R 4.5.0)
#>  evaluate               1.0.3   2025-01-10 [2] CRAN (R 4.5.0)
#>  fastmap                1.2.0   2024-05-15 [2] CRAN (R 4.5.0)
#>  generics               0.1.3   2022-07-05 [2] CRAN (R 4.5.0)
#>  GenomeInfoDb           1.43.4  2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#>  GenomeInfoDbData       1.2.13  2025-01-22 [2] Bioconductor
#>  GenomicRanges          1.59.1  2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#>  gtools                 3.9.5   2023-11-20 [2] CRAN (R 4.5.0)
#>  HiCParser            * 0.99.7  2025-02-19 [1] Bioconductor 3.21 (R 4.5.0)
#>  htmltools              0.5.8.1 2024-04-04 [2] CRAN (R 4.5.0)
#>  httr                   1.4.7   2023-08-15 [2] CRAN (R 4.5.0)
#>  InteractionSet         1.35.0  2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#>  IRanges                2.41.3  2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#>  jquerylib              0.1.4   2021-04-26 [2] CRAN (R 4.5.0)
#>  jsonlite               1.9.0   2025-02-19 [2] CRAN (R 4.5.0)
#>  knitr                  1.49    2024-11-08 [2] CRAN (R 4.5.0)
#>  lattice                0.22-6  2024-03-20 [3] CRAN (R 4.5.0)
#>  lifecycle              1.0.4   2023-11-07 [2] CRAN (R 4.5.0)
#>  Matrix                 1.7-2   2025-01-23 [3] CRAN (R 4.5.0)
#>  MatrixGenerics         1.19.1  2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#>  matrixStats            1.5.0   2025-01-07 [2] CRAN (R 4.5.0)
#>  pbapply                1.7-2   2023-06-27 [2] CRAN (R 4.5.0)
#>  R6                     2.6.1   2025-02-15 [2] CRAN (R 4.5.0)
#>  Rcpp                   1.0.14  2025-01-12 [2] CRAN (R 4.5.0)
#>  rhdf5                  2.51.2  2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#>  rhdf5filters           1.19.1  2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#>  Rhdf5lib               1.29.0  2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#>  rlang                  1.1.5   2025-01-17 [2] CRAN (R 4.5.0)
#>  rmarkdown              2.29    2024-11-04 [2] CRAN (R 4.5.0)
#>  S4Arrays               1.7.3   2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#>  S4Vectors              0.45.4  2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#>  sass                   0.4.9   2024-03-15 [2] CRAN (R 4.5.0)
#>  sessioninfo          * 1.2.3   2025-02-05 [2] CRAN (R 4.5.0)
#>  SparseArray            1.7.6   2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#>  SummarizedExperiment   1.37.0  2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#>  UCSC.utils             1.3.1   2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#>  xfun                   0.51    2025-02-19 [2] CRAN (R 4.5.0)
#>  XVector                0.47.2  2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#>  yaml                   2.3.10  2024-07-26 [2] CRAN (R 4.5.0)
#> 
#>  [1] /tmp/RtmpmjToPA/Rinst1078a01b62b45c
#>  [2] /home/biocbuild/bbs-3.21-bioc/R/site-library
#>  [3] /home/biocbuild/bbs-3.21-bioc/R/library
#>  * ── Packages attached to the search path.
#> 
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

6 Bibliography

Lun ATL, Perry M and Ing-Simmons E (2016). Infrastructure for genomic interactions: Bioconductor classes for Hi-C, ChIA-PET and related experiments. F1000Res. 5, 950