Contents

1 Introduction

The 4D Nucleome Data Coordination and Integration Center (DCIC) has developed and actively maintains a data portal providing public access to a wealth of resources to investigate 3D chromatin architecture. Notably, 3D chromatin conformation libraries relying on different technologies (“in situ” or “dilution” Hi-C, Capture Hi-C, Micro-C, DNase Hi-C, …), generated by 50+ collaborating labs, were homogenously processed, yielding more than 350 sets of processed files.

fourDNData (read 4DN-Data) is a package giving programmatic access to these uniformly processed Hi-C contact files.

The fourDNData() function provides a gateway to 4DN-hosted Hi-C files, including contact matrices (in .hic or .mcool) and other Hi-C derived files such as annotated compartments, domains, insulation scores, or .pairs files.

library(fourDNData)
head(fourDNData())
#>   experimentSetAccession     fileType     size organism experimentType details
#> 1           4DNES18BMU79        pairs 10151.53    mouse   in situ Hi-C   DpnII
#> 3           4DNES18BMU79          hic  5285.82    mouse   in situ Hi-C   DpnII
#> 4           4DNES18BMU79        mcool  6110.75    mouse   in situ Hi-C   DpnII
#> 5           4DNES18BMU79   boundaries     0.12    mouse   in situ Hi-C   DpnII
#> 6           4DNES18BMU79   insulation     7.18    mouse   in situ Hi-C   DpnII
#> 7           4DNES18BMU79 compartments     0.18    mouse   in situ Hi-C   DpnII
#>                                dataset
#> 1 Hi-C on Mouse Olfactory System cells
#> 3 Hi-C on Mouse Olfactory System cells
#> 4 Hi-C on Mouse Olfactory System cells
#> 5 Hi-C on Mouse Olfactory System cells
#> 6 Hi-C on Mouse Olfactory System cells
#> 7 Hi-C on Mouse Olfactory System cells
#>                                                         condition
#> 1 Mature olfactory sensory neurons with conditional Ldb1 knockout
#> 3 Mature olfactory sensory neurons with conditional Ldb1 knockout
#> 4 Mature olfactory sensory neurons with conditional Ldb1 knockout
#> 5 Mature olfactory sensory neurons with conditional Ldb1 knockout
#> 6 Mature olfactory sensory neurons with conditional Ldb1 knockout
#> 7 Mature olfactory sensory neurons with conditional Ldb1 knockout
#>                 biosource biosourceType             publication
#> 1 olfactory receptor cell  primary cell Monahan K et al. (2019)
#> 3 olfactory receptor cell  primary cell Monahan K et al. (2019)
#> 4 olfactory receptor cell  primary cell Monahan K et al. (2019)
#> 5 olfactory receptor cell  primary cell Monahan K et al. (2019)
#> 6 olfactory receptor cell  primary cell Monahan K et al. (2019)
#> 7 olfactory receptor cell  primary cell Monahan K et al. (2019)
#>                                                                                                                                   URL
#> 1 https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/49504f97-904e-48c1-8c20-1033680b66da/4DNFIC5AHBPV.pairs.gz
#> 3      https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/6cd4378a-8f51-4e65-99eb-15f5c80abf8d/4DNFIT4I5C6Z.hic
#> 4    https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/01fb704f-2fd7-48c6-91af-c5f4584529ed/4DNFIVPAXJO8.mcool
#> 5   https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/5c07cdee-53e2-43e0-8853-cfe5f057b3f1/4DNFIR3XCIMA.bed.gz
#> 6       https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/d1f4beb9-701f-4188-abe2-6271fe658770/4DNFIXKKNMS7.bw
#> 7       https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/3d429647-51c8-4e3a-a18b-eec0b1480905/4DNFIN13N8C1.bw
cool_file <- fourDNData('4DNESDP9ECMN')
cool_file
#>      experimentSetAccession     fileType   size organism experimentType details
#> 1067           4DNESDP9ECMN        pairs  14.77    human   in situ Hi-C    MboI
#> 1069           4DNESDP9ECMN          hic 197.60    human   in situ Hi-C    MboI
#> 1070           4DNESDP9ECMN        mcool  48.27    human   in situ Hi-C    MboI
#> 1071           4DNESDP9ECMN compartments   0.20    human   in situ Hi-C    MboI
#>                                          dataset
#> 1067 Hi-C on GM12878 cells - protocol variations
#> 1069 Hi-C on GM12878 cells - protocol variations
#> 1070 Hi-C on GM12878 cells - protocol variations
#> 1071 Hi-C on GM12878 cells - protocol variations
#>                                                              condition
#> 1067 in situ Hi-C on GM12878 crosslinking titration - 1% FA, 1 min, RT
#> 1069 in situ Hi-C on GM12878 crosslinking titration - 1% FA, 1 min, RT
#> 1070 in situ Hi-C on GM12878 crosslinking titration - 1% FA, 1 min, RT
#> 1071 in situ Hi-C on GM12878 crosslinking titration - 1% FA, 1 min, RT
#>      biosource          biosourceType              publication
#> 1067   GM12878 immortalized cell line Sanborn AL et al. (2015)
#> 1069   GM12878 immortalized cell line Sanborn AL et al. (2015)
#> 1070   GM12878 immortalized cell line Sanborn AL et al. (2015)
#> 1071   GM12878 immortalized cell line Sanborn AL et al. (2015)
#>                                                                                                                                      URL
#> 1067 https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/c2ae7404-501a-4d80-957b-cd677e2bd38a/4DNFIU5XG6TN.pairs.gz
#> 1069      https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/70c1472d-cf3a-41d7-8682-cd03b7cc978d/4DNFI2AGEBE5.hic
#> 1070    https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/c81d77c0-b57e-4a29-80ac-ec6ab0714f57/4DNFI4988896.mcool
#> 1071       https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/dc07042c-62d5-46ae-905d-8ec99b10cf9a/4DNFIDO8B3C6.bw

2 Installation

fourDNData package can be installed from Bioconductor using the following command:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("fourDNData")

3 HiCExperiment and fourDNData

The HiCExperiment package can be used to import .mcool files provided by fourDNData. Refer to HiCExperiment package documentation for further information.

library(HiCExperiment)
#> Consider using the `HiContacts` package to perform advanced genomic operations 
#> on `HiCExperiment` objects.
#> 
#> Read "Orchestrating Hi-C analysis with Bioconductor" online book to learn more:
#> https://js2264.github.io/OHCA/
ID <- '4DNESDP9ECMN'
cf <- CoolFile(
    path = fourDNData(ID, type = 'mcool'), 
    metadata = as.list(fourDNData()[fourDNData()$experimentSetAccession == ID,])
)
x <- import(cf, resolution = 250000, focus = 'chr5:10000000-50000000')
x
#> `HiCExperiment` object with 7,466 contacts over 161 regions 
#> -------
#> fileName: "/home/biocbuild/.cache/R/fourDNData/13cb6412776b37_4DNFI4988896.mcool" 
#> focus: "chr5:10,000,000-50,000,000" 
#> resolutions(13): 1000 2000 ... 5000000 10000000
#> active resolution: 250000 
#> interactions: 2158 
#> scores(2): count balanced 
#> topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) 
#> pairsFile: N/A 
#> metadata(12): experimentSetAccession fileType ... publication URL
interactions(x)
#> GInteractions object with 2158 interactions and 4 metadata columns:
#>          seqnames1           ranges1     seqnames2           ranges2 |
#>              <Rle>         <IRanges>         <Rle>         <IRanges> |
#>      [1]      chr5 10000001-10250000 ---      chr5 10000001-10250000 |
#>      [2]      chr5 10000001-10250000 ---      chr5 10250001-10500000 |
#>      [3]      chr5 10000001-10250000 ---      chr5 10500001-10750000 |
#>      [4]      chr5 10000001-10250000 ---      chr5 10750001-11000000 |
#>      [5]      chr5 10000001-10250000 ---      chr5 11250001-11500000 |
#>      ...       ...               ... ...       ...               ... .
#>   [2154]      chr5 46000001-46250000 ---      chr5 46250001-46500000 |
#>   [2155]      chr5 46250001-46500000 ---      chr5 46250001-46500000 |
#>   [2156]      chr5 46250001-46500000 ---      chr5 47000001-47250000 |
#>   [2157]      chr5 49500001-49750000 ---      chr5 49500001-49750000 |
#>   [2158]      chr5 49750001-50000000 ---      chr5 49750001-50000000 |
#>            bin_id1   bin_id2     count  balanced
#>          <numeric> <numeric> <numeric> <numeric>
#>      [1]      3560      3560        30 0.3097516
#>      [2]      3560      3561         7 0.0574021
#>      [3]      3560      3562         2 0.0187244
#>      [4]      3560      3563         6 0.0567218
#>      [5]      3560      3565         1 0.0108409
#>      ...       ...       ...       ...       ...
#>   [2154]      3704      3705         2       NaN
#>   [2155]      3705      3705         5       NaN
#>   [2156]      3705      3708         1       NaN
#>   [2157]      3718      3718        11  0.320998
#>   [2158]      3719      3719         1       NaN
#>   -------
#>   regions: 161 ranges and 4 metadata columns
#>   seqinfo: 24 sequences from an unspecified genome
as(x, 'ContactMatrix')
#> class: ContactMatrix 
#> dim: 161 161 
#> type: dgCMatrix 
#> rownames: NULL
#> colnames: NULL
#> metadata(0):
#> regions: 161

Rather than importing multiple files corresponding to a single experimentSet accession ID one by one, one can import all the available files associated with a experimentSet accession ID into a HiCExperiment object by using the fourDNHiCExperiment() function.

library(HiCExperiment)
x <- fourDNHiCExperiment('4DNESDP9ECMN')
#> Fetching local Hi-C contact map from Bioc cache
#> Fetching local compartments bigwig file from Bioc cache
#> Insulation not found for the provided experimentSet accession.
#> Borders not found for the provided experimentSet accession.
#> Importing contacts in memory

4 Session info

sessionInfo()
#> R version 4.4.0 beta (2024-04-15 r86425)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] HiCExperiment_1.4.0 fourDNData_1.4.0    BiocStyle_2.32.0   
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.2.1            dplyr_1.1.4                
#>  [3] blob_1.2.4                  Biostrings_2.72.0          
#>  [5] filelock_1.0.3              bitops_1.0-7               
#>  [7] fastmap_1.1.1               RCurl_1.98-1.14            
#>  [9] BiocFileCache_2.12.0        GenomicAlignments_1.40.0   
#> [11] XML_3.99-0.16.1             digest_0.6.35              
#> [13] lifecycle_1.0.4             RSQLite_2.3.6              
#> [15] magrittr_2.0.3              compiler_4.4.0             
#> [17] rlang_1.1.3                 sass_0.4.9                 
#> [19] tools_4.4.0                 utf8_1.2.4                 
#> [21] yaml_2.3.8                  rtracklayer_1.64.0         
#> [23] knitr_1.46                  S4Arrays_1.4.0             
#> [25] bit_4.0.5                   curl_5.2.1                 
#> [27] DelayedArray_0.30.0         abind_1.4-5                
#> [29] BiocParallel_1.38.0         withr_3.0.0                
#> [31] purrr_1.0.2                 BiocGenerics_0.50.0        
#> [33] grid_4.4.0                  stats4_4.4.0               
#> [35] fansi_1.0.6                 Rhdf5lib_1.26.0            
#> [37] SummarizedExperiment_1.34.0 cli_3.6.2                  
#> [39] rmarkdown_2.26              crayon_1.5.2               
#> [41] generics_0.1.3              rjson_0.2.21               
#> [43] httr_1.4.7                  tzdb_0.4.0                 
#> [45] DBI_1.2.2                   cachem_1.0.8               
#> [47] rhdf5_2.48.0                zlibbioc_1.50.0            
#> [49] parallel_4.4.0              BiocManager_1.30.22        
#> [51] XVector_0.44.0              restfulr_0.0.15            
#> [53] matrixStats_1.3.0           vctrs_0.6.5                
#> [55] Matrix_1.7-0                jsonlite_1.8.8             
#> [57] bookdown_0.39               IRanges_2.38.0             
#> [59] S4Vectors_0.42.0            bit64_4.0.5                
#> [61] strawr_0.0.91               jquerylib_0.1.4            
#> [63] glue_1.7.0                  codetools_0.2-20           
#> [65] GenomeInfoDb_1.40.0         GenomicRanges_1.56.0       
#> [67] BiocIO_1.14.0               UCSC.utils_1.0.0           
#> [69] tibble_3.2.1                pillar_1.9.0               
#> [71] htmltools_0.5.8.1           rhdf5filters_1.16.0        
#> [73] GenomeInfoDbData_1.2.12     R6_2.5.1                   
#> [75] dbplyr_2.5.0                vroom_1.6.5                
#> [77] evaluate_0.23               lattice_0.22-6             
#> [79] Biobase_2.64.0              Rsamtools_2.20.0           
#> [81] memoise_2.0.1               bslib_0.7.0                
#> [83] Rcpp_1.0.12                 InteractionSet_1.32.0      
#> [85] SparseArray_1.4.0           xfun_0.43                  
#> [87] MatrixGenerics_1.16.0       pkgconfig_2.0.3