HiCParser 0.99.7
HiCParser is based on other packages and in particular in those that have implemented the infrastructure needed for dealing with HiC data with several replicates and conditions. Is provides several parsers, for several HiC data standard format to import them into R in a InteractionSet object.
HiCParser
We hope that HiCParser will be useful for your research. Please use the following information to cite the package and the overall approach. Thank you!
## Citation info
citation("HiCParser")
#> To cite package 'HiCParser' in publications use:
#>
#> Maigne E, Zytnicki M (2025). _HiCParser package to parse HiC data and
#> import them in R_. doi:10.18129/B9.bioc.HiCParser
#> <https://doi.org/10.18129/B9.bioc.HiCParser>,
#> https://github.com/emaigne/HiCParser/HiCParser - R package version
#> 0.99.0, <http://www.bioconductor.org/packages/HiCParser>.
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Manual{,
#> title = {{HiCParser} package to parse HiC data and import them in R},
#> author = {Elise Maigne and Matthias Zytnicki},
#> year = {2025},
#> url = {http://www.bioconductor.org/packages/HiCParser},
#> note = {https://github.com/emaigne/HiCParser/HiCParser - R package version 0.99.0},
#> doi = {10.18129/B9.bioc.HiCParser},
#> }
HiCParser
library("HiCParser")
HiCParser
can import Hi-C data sets in various different formats:
- Cooler .cool
or .mcool
files.
- Juicer .hic
files.
- HiC-Pro .matrix
and .bed
files.
- Tabular (.tsv
, .csv
, …) files.
.cool
filesTo load .cool
files generated by [Cooler][cooler-documentation]
[@cooler]:
# Path to each file
paths <- c(
"path/to/condition-1.replicate-1.cool",
"path/to/condition-1.replicate-2.cool",
"path/to/condition-1.replicate-3.cool",
"path/to/condition-2.replicate-1.cool",
"path/to/condition-2.replicate-2.cool",
"path/to/condition-2.replicate-3.cool"
)
# For the sake of the example, we will use the same file, several times
paths <- rep(
system.file("extdata",
"hicsample_21.cool",
package = "HiCParser"
),
6
)
# Condition and replicate of each file. Can be names instead of numbers.
conditions <- c(1, 1, 1, 2, 2, 2)
replicates <- c(1, 2, 3, 1, 2, 3)
# Instantiation of data set
hic.experiment <- parseCool(
paths,
conditions = conditions,
replicates = replicates
)
#> Loading required namespace: rhdf5
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.cool'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.cool'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.cool'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.cool'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.cool'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.cool'.
.mcool
filesTo load .mcool
files generated by [Cooler][cooler-documentation]
[@cooler]:
# Path to each file
paths <- c(
"path/to/condition-1.replicate-1.mcool",
"path/to/condition-1.replicate-2.mcool",
"path/to/condition-1.replicate-3.mcool",
"path/to/condition-2.replicate-1.mcool",
"path/to/condition-2.replicate-2.mcool",
"path/to/condition-2.replicate-3.mcool"
)
# For the sake of the example, we will use the same file, several times
paths <- rep(
system.file("extdata",
"hicsample_21.mcool",
package = "HiCParser"
),
6
)
# Condition and replicate of each file. Can be names instead of numbers.
conditions <- c(1, 1, 1, 2, 2, 2)
replicates <- c(1, 2, 3, 1, 2, 3)
# mcool files can store several resolutions.
# We will mention the one we need.
binSize <- 5000000
# Instantiation of data set
# The same function "parseCool" is used for cool and mcool files
hic.experiment <- parseCool(
paths,
conditions = conditions,
replicates = replicates,
binSize = binSize # Specified for .mcool files.
)
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.mcool'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.mcool'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.mcool'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.mcool'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.mcool'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.mcool'.
To load .hic
files generated by [Juicer][juicer-documentation] [@juicer]:
# Path to each file
paths <- c(
"path/to/condition-1.replicate-1.hic",
"path/to/condition-1.replicate-2.hic",
"path/to/condition-2.replicate-1.hic",
"path/to/condition-2.replicate-2.hic",
"path/to/condition-3.replicate-1.hic"
)
# For the sake of the example, we will use the same file, several times
paths <- rep(
system.file("extdata",
"hicsample_21.hic",
package = "HiCParser"
),
6
)
# Condition and replicate of each file. Can be names instead of numbers.
conditions <- c(1, 1, 1, 2, 2, 2)
replicates <- c(1, 2, 3, 1, 2, 3)
# hic files can store several resolutions.
# We will mention the one we need.
binSize <- 5000000
# Instantiation of data set
hic.experiment <- parseHiC(
paths,
conditions = conditions,
replicates = replicates,
binSize = binSize
)
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.hic'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.hic'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.hic'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.hic'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.hic'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.hic'.
Currently, HiCParser
supports the hic format up to the version 9.
To load .matrix
and .bed
files generated by [HiC-Pro][hicpro-documentation]
[@hicpro]:
# Path to each matrix file
matrixPaths <- c(
"path/to/condition-1.replicate-1.matrix",
"path/to/condition-1.replicate-2.matrix",
"path/to/condition-1.replicate-3.matrix",
"path/to/condition-2.replicate-1.matrix",
"path/to/condition-2.replicate-2.matrix",
"path/to/condition-2.replicate-3.matrix"
)
# For the sake of the example, we will use the same file, several times
matrixPaths <- rep(
system.file("extdata",
"hicsample_21.matrix",
package = "HiCParser"
),
6
)
# Path to each bed file
bedPaths <- c(
"path/to/condition-1.replicate-1.bed",
"path/to/condition-1.replicate-2.bed",
"path/to/condition-1.replicate-3.bed",
"path/to/condition-2.replicate-1.bed",
"path/to/condition-2.replicate-2.bed",
"path/to/condition-2.replicate-3.bed"
)
# Alternatively, if the same bed file is used, we can provide it only once
bedPaths <- system.file("extdata",
"hicsample_21.bed",
package = "HiCParser"
)
# Condition and replicate of each file. Can be names instead of numbers.
conditions <- c(1, 1, 1, 2, 2, 2)
replicates <- c(1, 2, 3, 1, 2, 3)
# Instantiation of data set
hic.experiment <- parseHiCPro(
matrixPaths = matrixPaths,
bedPaths = bedPaths,
conditions = conditions,
replicates = replicates
)
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.matrix' and '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.bed'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.matrix' and '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.bed'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.matrix' and '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.bed'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.matrix' and '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.bed'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.matrix' and '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.bed'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.matrix' and '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.bed'.
A tabular file is a tab-separated multi-replicate sparse matrix with a header:
chromosome position 1 position 2 C1.R1 C1.R2 C1.R3 ...
Y 1500000 7500000 145 184 72 ...
The number of interactions between position 1
and position 2
of
chromosome
are reported in each condition.replicate
column. There is no
limit to the number of conditions and replicates.
To load Hi-C data in this format:
hic.experiment <- parseTabular(
system.file("extdata",
"hicsample_21.tsv",
package = "HiCParser"
),
sep = "\t"
)
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.tsv'.
The output is a InteractionSet. This object can store one or several samples.
Please read the documentation associated with the InteractionSet package to known more about this format.
library("HiCParser")
hicFilePath <- system.file("extdata", "hicsample_21.hic", package = "HiCParser")
hic.experiment <- parseHiC(
paths = rep(hicFilePath, 6),
binSize = 5000000,
conditions = rep(seq(2), each = 3),
replicates = rep(seq(3), 2)
)
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.hic'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.hic'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.hic'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.hic'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.hic'.
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.hic'.
hic.experiment
#> class: InteractionSet
#> dim: 44 6
#> metadata(0):
#> assays(1): ''
#> rownames: NULL
#> rowData names(1): chromosome
#> colnames: NULL
#> colData names(2): condition replicate
#> type: StrictGInteractions
#> regions: 9
The conditions and replicates are reported in the colData
slot :
SummarizedExperiment::colData(hic.experiment)
#> DataFrame with 6 rows and 2 columns
#> condition replicate
#> <integer> <integer>
#> 1 1 1
#> 2 1 2
#> 3 1 3
#> 4 2 1
#> 5 2 2
#> 6 2 3
They corresponds to columns of the assays
matrix (containing
interactions values):
head(SummarizedExperiment::assay(hic.experiment))
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 79 79 79 79 79 79
#> [2,] 22 22 22 22 22 22
#> [3,] 3 3 3 3 3 3
#> [4,] 1 1 1 1 1 1
#> [5,] 1 1 1 1 1 1
#> [6,] 2 2 2 2 2 2
The positions of interactions are in the interactions
slot of the object:
InteractionSet::interactions(hic.experiment)
#> StrictGInteractions object with 44 interactions and 1 metadata column:
#> seqnames1 ranges1 seqnames2 ranges2 | chromosome
#> <Rle> <IRanges> <Rle> <IRanges> | <Rle>
#> [1] 21 5000001-10000000 --- 21 5000001-10000000 | 21
#> [2] 21 5000001-10000000 --- 21 10000001-15000000 | 21
#> [3] 21 5000001-10000000 --- 21 15000001-20000000 | 21
#> [4] 21 5000001-10000000 --- 21 20000001-25000000 | 21
#> [5] 21 5000001-10000000 --- 21 25000001-30000000 | 21
#> ... ... ... ... ... ... . ...
#> [40] 21 35000001-40000000 --- 21 40000001-45000000 | 21
#> [41] 21 35000001-40000000 --- 21 45000001-50000000 | 21
#> [42] 21 40000001-45000000 --- 21 40000001-45000000 | 21
#> [43] 21 40000001-45000000 --- 21 45000001-50000000 | 21
#> [44] 21 45000001-50000000 --- 21 45000001-50000000 | 21
#> -------
#> regions: 9 ranges and 1 metadata column
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
A function mergeInteractionSet
to merge InteractionSet
objects,
from the same experiment (for differents replicates or conditions).
It merges the the data containing bins of interactions and fill the assays matrix accordingly, returning an assays matrix with several columns.
The object returned by the function is an InteractionSet
.
Here is a fictitious example:
path <- system.file("extdata", "hicsample_21.cool", package = "HiCParser")
object1 <- parseCool(path, conditions = 1, replicates = 1)
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.cool'.
# Creating an object with a different condition
object2 <- parseCool(path, conditions = 2, replicates = 1)
#>
#> Parsing '/tmp/RtmpmjToPA/Rinst1078a01b62b45c/HiCParser/extdata/hicsample_21.cool'.
The merged object:
objectMerged <- mergeInteractionSet(object1, object2)
SummarizedExperiment::colData(objectMerged)
#> DataFrame with 2 rows and 2 columns
#> condition replicate
#> <numeric> <numeric>
#> 1 1 1
#> 2 2 1
head(SummarizedExperiment::assay(objectMerged))
#> [,1] [,2]
#> [1,] 79 79
#> [2,] 22 22
#> [3,] 3 3
#> [4,] 1 1
#> [5,] 1 1
#> [6,] 2 2
This package was developed using biocthis.
R
session information.
#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#> setting value
#> version R Under development (unstable) (2025-01-20 r87609)
#> os Ubuntu 24.04.2 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate C
#> ctype en_US.UTF-8
#> tz America/New_York
#> date 2025-02-19
#> pandoc 2.7.3 @ /usr/bin/ (via rmarkdown)
#> quarto 1.5.57 @ /usr/local/bin/quarto
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> abind 1.4-8 2024-09-12 [2] CRAN (R 4.5.0)
#> Biobase 2.67.0 2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#> BiocGenerics 0.53.6 2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#> BiocManager 1.30.25 2024-08-28 [2] CRAN (R 4.5.0)
#> BiocStyle * 2.35.0 2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#> bookdown 0.42 2025-01-07 [2] CRAN (R 4.5.0)
#> bslib 0.9.0 2025-01-30 [2] CRAN (R 4.5.0)
#> cachem 1.1.0 2024-05-16 [2] CRAN (R 4.5.0)
#> cli 3.6.4 2025-02-13 [2] CRAN (R 4.5.0)
#> crayon 1.5.3 2024-06-20 [2] CRAN (R 4.5.0)
#> data.table 1.16.4 2024-12-06 [2] CRAN (R 4.5.0)
#> DelayedArray 0.33.6 2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#> digest 0.6.37 2024-08-19 [2] CRAN (R 4.5.0)
#> evaluate 1.0.3 2025-01-10 [2] CRAN (R 4.5.0)
#> fastmap 1.2.0 2024-05-15 [2] CRAN (R 4.5.0)
#> generics 0.1.3 2022-07-05 [2] CRAN (R 4.5.0)
#> GenomeInfoDb 1.43.4 2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#> GenomeInfoDbData 1.2.13 2025-01-22 [2] Bioconductor
#> GenomicRanges 1.59.1 2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#> gtools 3.9.5 2023-11-20 [2] CRAN (R 4.5.0)
#> HiCParser * 0.99.7 2025-02-19 [1] Bioconductor 3.21 (R 4.5.0)
#> htmltools 0.5.8.1 2024-04-04 [2] CRAN (R 4.5.0)
#> httr 1.4.7 2023-08-15 [2] CRAN (R 4.5.0)
#> InteractionSet 1.35.0 2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#> IRanges 2.41.3 2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#> jquerylib 0.1.4 2021-04-26 [2] CRAN (R 4.5.0)
#> jsonlite 1.9.0 2025-02-19 [2] CRAN (R 4.5.0)
#> knitr 1.49 2024-11-08 [2] CRAN (R 4.5.0)
#> lattice 0.22-6 2024-03-20 [3] CRAN (R 4.5.0)
#> lifecycle 1.0.4 2023-11-07 [2] CRAN (R 4.5.0)
#> Matrix 1.7-2 2025-01-23 [3] CRAN (R 4.5.0)
#> MatrixGenerics 1.19.1 2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#> matrixStats 1.5.0 2025-01-07 [2] CRAN (R 4.5.0)
#> pbapply 1.7-2 2023-06-27 [2] CRAN (R 4.5.0)
#> R6 2.6.1 2025-02-15 [2] CRAN (R 4.5.0)
#> Rcpp 1.0.14 2025-01-12 [2] CRAN (R 4.5.0)
#> rhdf5 2.51.2 2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#> rhdf5filters 1.19.1 2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#> Rhdf5lib 1.29.0 2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#> rlang 1.1.5 2025-01-17 [2] CRAN (R 4.5.0)
#> rmarkdown 2.29 2024-11-04 [2] CRAN (R 4.5.0)
#> S4Arrays 1.7.3 2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#> S4Vectors 0.45.4 2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#> sass 0.4.9 2024-03-15 [2] CRAN (R 4.5.0)
#> sessioninfo * 1.2.3 2025-02-05 [2] CRAN (R 4.5.0)
#> SparseArray 1.7.6 2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#> SummarizedExperiment 1.37.0 2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#> UCSC.utils 1.3.1 2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#> xfun 0.51 2025-02-19 [2] CRAN (R 4.5.0)
#> XVector 0.47.2 2025-02-19 [2] Bioconductor 3.21 (R 4.5.0)
#> yaml 2.3.10 2024-07-26 [2] CRAN (R 4.5.0)
#>
#> [1] /tmp/RtmpmjToPA/Rinst1078a01b62b45c
#> [2] /home/biocbuild/bbs-3.21-bioc/R/site-library
#> [3] /home/biocbuild/bbs-3.21-bioc/R/library
#> * ── Packages attached to the search path.
#>
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Lun ATL, Perry M and Ing-Simmons E (2016). Infrastructure for genomic interactions: Bioconductor classes for Hi-C, ChIA-PET and related experiments. F1000Res. 5, 950