The 4D Nucleome Data Coordination and Integration Center (DCIC) has developed and actively maintains a data portal providing public access to a wealth of resources to investigate 3D chromatin architecture. Notably, 3D chromatin conformation libraries relying on different technologies (“in situ” or “dilution” Hi-C, Capture Hi-C, Micro-C, DNase Hi-C, …), generated by 50+ collaborating labs, were homogenously processed, yielding more than 350 sets of processed files.
fourDNData
(read 4DN-Data) is a package giving programmatic access
to these uniformly processed Hi-C contact files.
The fourDNData()
function provides a gateway to 4DN-hosted Hi-C files,
including contact matrices (in .hic
or .mcool
) and other Hi-C derived
files such as annotated compartments, domains, insulation scores, or
.pairs
files.
library(fourDNData)
head(fourDNData())
#> experimentSetAccession fileType size organism experimentType details
#> 1 4DNES18BMU79 pairs 10151.53 mouse in situ Hi-C DpnII
#> 3 4DNES18BMU79 hic 5285.82 mouse in situ Hi-C DpnII
#> 4 4DNES18BMU79 mcool 6110.75 mouse in situ Hi-C DpnII
#> 5 4DNES18BMU79 boundaries 0.12 mouse in situ Hi-C DpnII
#> 6 4DNES18BMU79 insulation 7.18 mouse in situ Hi-C DpnII
#> 7 4DNES18BMU79 compartments 0.18 mouse in situ Hi-C DpnII
#> dataset
#> 1 Hi-C on Mouse Olfactory System cells
#> 3 Hi-C on Mouse Olfactory System cells
#> 4 Hi-C on Mouse Olfactory System cells
#> 5 Hi-C on Mouse Olfactory System cells
#> 6 Hi-C on Mouse Olfactory System cells
#> 7 Hi-C on Mouse Olfactory System cells
#> condition
#> 1 Mature olfactory sensory neurons with conditional Ldb1 knockout
#> 3 Mature olfactory sensory neurons with conditional Ldb1 knockout
#> 4 Mature olfactory sensory neurons with conditional Ldb1 knockout
#> 5 Mature olfactory sensory neurons with conditional Ldb1 knockout
#> 6 Mature olfactory sensory neurons with conditional Ldb1 knockout
#> 7 Mature olfactory sensory neurons with conditional Ldb1 knockout
#> biosource biosourceType publication
#> 1 olfactory receptor cell primary cell Monahan K et al. (2019)
#> 3 olfactory receptor cell primary cell Monahan K et al. (2019)
#> 4 olfactory receptor cell primary cell Monahan K et al. (2019)
#> 5 olfactory receptor cell primary cell Monahan K et al. (2019)
#> 6 olfactory receptor cell primary cell Monahan K et al. (2019)
#> 7 olfactory receptor cell primary cell Monahan K et al. (2019)
#> URL
#> 1 https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/49504f97-904e-48c1-8c20-1033680b66da/4DNFIC5AHBPV.pairs.gz
#> 3 https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/6cd4378a-8f51-4e65-99eb-15f5c80abf8d/4DNFIT4I5C6Z.hic
#> 4 https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/01fb704f-2fd7-48c6-91af-c5f4584529ed/4DNFIVPAXJO8.mcool
#> 5 https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/5c07cdee-53e2-43e0-8853-cfe5f057b3f1/4DNFIR3XCIMA.bed.gz
#> 6 https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/d1f4beb9-701f-4188-abe2-6271fe658770/4DNFIXKKNMS7.bw
#> 7 https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/3d429647-51c8-4e3a-a18b-eec0b1480905/4DNFIN13N8C1.bw
cool_file <- fourDNData('4DNESDP9ECMN')
cool_file
#> experimentSetAccession fileType size organism experimentType details
#> 1067 4DNESDP9ECMN pairs 14.77 human in situ Hi-C MboI
#> 1069 4DNESDP9ECMN hic 197.60 human in situ Hi-C MboI
#> 1070 4DNESDP9ECMN mcool 48.27 human in situ Hi-C MboI
#> 1071 4DNESDP9ECMN compartments 0.20 human in situ Hi-C MboI
#> dataset
#> 1067 Hi-C on GM12878 cells - protocol variations
#> 1069 Hi-C on GM12878 cells - protocol variations
#> 1070 Hi-C on GM12878 cells - protocol variations
#> 1071 Hi-C on GM12878 cells - protocol variations
#> condition
#> 1067 in situ Hi-C on GM12878 crosslinking titration - 1% FA, 1 min, RT
#> 1069 in situ Hi-C on GM12878 crosslinking titration - 1% FA, 1 min, RT
#> 1070 in situ Hi-C on GM12878 crosslinking titration - 1% FA, 1 min, RT
#> 1071 in situ Hi-C on GM12878 crosslinking titration - 1% FA, 1 min, RT
#> biosource biosourceType publication
#> 1067 GM12878 immortalized cell line Sanborn AL et al. (2015)
#> 1069 GM12878 immortalized cell line Sanborn AL et al. (2015)
#> 1070 GM12878 immortalized cell line Sanborn AL et al. (2015)
#> 1071 GM12878 immortalized cell line Sanborn AL et al. (2015)
#> URL
#> 1067 https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/c2ae7404-501a-4d80-957b-cd677e2bd38a/4DNFIU5XG6TN.pairs.gz
#> 1069 https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/70c1472d-cf3a-41d7-8682-cd03b7cc978d/4DNFI2AGEBE5.hic
#> 1070 https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/c81d77c0-b57e-4a29-80ac-ec6ab0714f57/4DNFI4988896.mcool
#> 1071 https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/dc07042c-62d5-46ae-905d-8ec99b10cf9a/4DNFIDO8B3C6.bw
fourDNData
package can be installed from Bioconductor using the following
command:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("fourDNData")
The HiCExperiment
package can be used to import .mcool
files provided by
fourDNData
. Refer to HiCExperiment
package documentation for further
information.
library(HiCExperiment)
#> Consider using the `HiContacts` package to perform advanced genomic operations
#> on `HiCExperiment` objects.
#>
#> Read "Orchestrating Hi-C analysis with Bioconductor" online book to learn more:
#> https://js2264.github.io/OHCA/
ID <- '4DNESDP9ECMN'
cf <- CoolFile(
path = fourDNData(ID, type = 'mcool'),
metadata = as.list(fourDNData()[fourDNData()$experimentSetAccession == ID,])
)
x <- import(cf, resolution = 250000, focus = 'chr5:10000000-50000000')
x
#> `HiCExperiment` object with 7,466 contacts over 161 regions
#> -------
#> fileName: "/home/biocbuild/.cache/R/fourDNData/13cb6412776b37_4DNFI4988896.mcool"
#> focus: "chr5:10,000,000-50,000,000"
#> resolutions(13): 1000 2000 ... 5000000 10000000
#> active resolution: 250000
#> interactions: 2158
#> scores(2): count balanced
#> topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0)
#> pairsFile: N/A
#> metadata(12): experimentSetAccession fileType ... publication URL
interactions(x)
#> GInteractions object with 2158 interactions and 4 metadata columns:
#> seqnames1 ranges1 seqnames2 ranges2 |
#> <Rle> <IRanges> <Rle> <IRanges> |
#> [1] chr5 10000001-10250000 --- chr5 10000001-10250000 |
#> [2] chr5 10000001-10250000 --- chr5 10250001-10500000 |
#> [3] chr5 10000001-10250000 --- chr5 10500001-10750000 |
#> [4] chr5 10000001-10250000 --- chr5 10750001-11000000 |
#> [5] chr5 10000001-10250000 --- chr5 11250001-11500000 |
#> ... ... ... ... ... ... .
#> [2154] chr5 46000001-46250000 --- chr5 46250001-46500000 |
#> [2155] chr5 46250001-46500000 --- chr5 46250001-46500000 |
#> [2156] chr5 46250001-46500000 --- chr5 47000001-47250000 |
#> [2157] chr5 49500001-49750000 --- chr5 49500001-49750000 |
#> [2158] chr5 49750001-50000000 --- chr5 49750001-50000000 |
#> bin_id1 bin_id2 count balanced
#> <numeric> <numeric> <numeric> <numeric>
#> [1] 3560 3560 30 0.3097516
#> [2] 3560 3561 7 0.0574021
#> [3] 3560 3562 2 0.0187244
#> [4] 3560 3563 6 0.0567218
#> [5] 3560 3565 1 0.0108409
#> ... ... ... ... ...
#> [2154] 3704 3705 2 NaN
#> [2155] 3705 3705 5 NaN
#> [2156] 3705 3708 1 NaN
#> [2157] 3718 3718 11 0.320998
#> [2158] 3719 3719 1 NaN
#> -------
#> regions: 161 ranges and 4 metadata columns
#> seqinfo: 24 sequences from an unspecified genome
as(x, 'ContactMatrix')
#> class: ContactMatrix
#> dim: 161 161
#> type: dgCMatrix
#> rownames: NULL
#> colnames: NULL
#> metadata(0):
#> regions: 161
Rather than importing multiple files corresponding to a single experimentSet
accession ID one by one, one can import all the available files associated with
a experimentSet accession ID into a HiCExperiment
object by using the
fourDNHiCExperiment()
function.
library(HiCExperiment)
x <- fourDNHiCExperiment('4DNESDP9ECMN')
#> Fetching local Hi-C contact map from Bioc cache
#> Fetching local compartments bigwig file from Bioc cache
#> Insulation not found for the provided experimentSet accession.
#> Borders not found for the provided experimentSet accession.
#> Importing contacts in memory
sessionInfo()
#> R version 4.4.0 beta (2024-04-15 r86425)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] HiCExperiment_1.4.0 fourDNData_1.4.0 BiocStyle_2.32.0
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.2.1 dplyr_1.1.4
#> [3] blob_1.2.4 Biostrings_2.72.0
#> [5] filelock_1.0.3 bitops_1.0-7
#> [7] fastmap_1.1.1 RCurl_1.98-1.14
#> [9] BiocFileCache_2.12.0 GenomicAlignments_1.40.0
#> [11] XML_3.99-0.16.1 digest_0.6.35
#> [13] lifecycle_1.0.4 RSQLite_2.3.6
#> [15] magrittr_2.0.3 compiler_4.4.0
#> [17] rlang_1.1.3 sass_0.4.9
#> [19] tools_4.4.0 utf8_1.2.4
#> [21] yaml_2.3.8 rtracklayer_1.64.0
#> [23] knitr_1.46 S4Arrays_1.4.0
#> [25] bit_4.0.5 curl_5.2.1
#> [27] DelayedArray_0.30.0 abind_1.4-5
#> [29] BiocParallel_1.38.0 withr_3.0.0
#> [31] purrr_1.0.2 BiocGenerics_0.50.0
#> [33] grid_4.4.0 stats4_4.4.0
#> [35] fansi_1.0.6 Rhdf5lib_1.26.0
#> [37] SummarizedExperiment_1.34.0 cli_3.6.2
#> [39] rmarkdown_2.26 crayon_1.5.2
#> [41] generics_0.1.3 rjson_0.2.21
#> [43] httr_1.4.7 tzdb_0.4.0
#> [45] DBI_1.2.2 cachem_1.0.8
#> [47] rhdf5_2.48.0 zlibbioc_1.50.0
#> [49] parallel_4.4.0 BiocManager_1.30.22
#> [51] XVector_0.44.0 restfulr_0.0.15
#> [53] matrixStats_1.3.0 vctrs_0.6.5
#> [55] Matrix_1.7-0 jsonlite_1.8.8
#> [57] bookdown_0.39 IRanges_2.38.0
#> [59] S4Vectors_0.42.0 bit64_4.0.5
#> [61] strawr_0.0.91 jquerylib_0.1.4
#> [63] glue_1.7.0 codetools_0.2-20
#> [65] GenomeInfoDb_1.40.0 GenomicRanges_1.56.0
#> [67] BiocIO_1.14.0 UCSC.utils_1.0.0
#> [69] tibble_3.2.1 pillar_1.9.0
#> [71] htmltools_0.5.8.1 rhdf5filters_1.16.0
#> [73] GenomeInfoDbData_1.2.12 R6_2.5.1
#> [75] dbplyr_2.5.0 vroom_1.6.5
#> [77] evaluate_0.23 lattice_0.22-6
#> [79] Biobase_2.64.0 Rsamtools_2.20.0
#> [81] memoise_2.0.1 bslib_0.7.0
#> [83] Rcpp_1.0.12 InteractionSet_1.32.0
#> [85] SparseArray_1.4.0 xfun_0.43
#> [87] MatrixGenerics_1.16.0 pkgconfig_2.0.3