Contents

1 Getting started

1.1 The Contacts class

HiContacts package implements the new Contacts S4 class. It is build on pre-existing Bioconductor classes, namely InteractionSet, GenomicInterations and ContactMatrix (Lun, Perry & Ing-Simmons, F1000Research 2016), and leverages them to import locally stored .(m)cool files. It further provides analytical and visualization tools to investigate contact maps directly in R.

showClass("Contacts")
#> Class "Contacts" [package "HiContacts"]
#> 
#> Slots:
#>                                                                   
#> Name:             fileName               focus         resolutions
#> Class:           character     characterOrNULL             numeric
#>                                                                   
#> Name:           resolution        interactions              scores
#> Class:             numeric       GInteractions          SimpleList
#>                                                                   
#> Name:  topologicalFeatures           pairsFile            metadata
#> Class:          SimpleList     characterOrNULL                list
#> 
#> Extends: "Annotated"
contacts <- contacts_yeast()
#> snapshotDate(): 2022-10-24
#> see ?HiContactsData and browseVignettes('HiContactsData') for documentation
#> loading from cache
contacts
#> `Contacts` object with 74,360 interactions over 802 regions 
#> -------
#> fileName: "/home/biocbuild/.cache/R/ExperimentHub/37cabfdcee0b5_7752" 
#> focus: "II" 
#> resolutions(5): 1000 2000 4000 8000 16000
#> current resolution: 1000 
#> interactions: 74360 
#> scores(2): raw balanced 
#> topologicalFeatures: loops(0) borders(0) compartments(0) viewpoints(0) 
#> pairsFile: N/A 
#> metadata(0):
citation('HiContacts')
#> 
#> To cite package 'HiContacts' in publications use:
#> 
#>   Serizay J (2022). _HiContacts: HiContacts: R interface to cool
#>   files_. R package version 1.0.0,
#>   <https://github.com/js2264/HiContacts>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {HiContacts: HiContacts: R interface to cool files},
#>     author = {Jacques Serizay},
#>     year = {2022},
#>     note = {R package version 1.0.0},
#>     url = {https://github.com/js2264/HiContacts},
#>   }

1.2 Basics: importing .(m)/cool files as Contacts objects

The HiContactsData package gives access to a range of toy datasets stored by Bioconductor in the ExperimentHub.

library(HiContacts)
library(HiContactsData)
cool_file <- HiContactsData('yeast_wt', format = 'cool')
#> snapshotDate(): 2022-10-24
#> see ?HiContactsData and browseVignettes('HiContactsData') for documentation
#> loading from cache
cool_file
#>                                                       EH7701 
#> "/home/biocbuild/.cache/R/ExperimentHub/3508b72098fa84_7751"

The Contacts() function can be used to import a Hi-C matrix locally stored as a cool/mcool file. It creates a Contacts object.

range <- 'I:20000-80000' # range of interest
contacts <- Contacts(cool_file, focus = range)
summary(contacts)
#> `Contacts` object with 1,653 interactions over 59 regions
contacts
#> `Contacts` object with 1,653 interactions over 59 regions 
#> -------
#> fileName: "/home/biocbuild/.cache/R/ExperimentHub/3508b72098fa84_7751" 
#> focus: "I:20,000-80,000" 
#> resolutions(1): 1000
#> current resolution: 1000 
#> interactions: 1653 
#> scores(2): raw balanced 
#> topologicalFeatures: loops(0) borders(0) compartments(0) viewpoints(0) 
#> pairsFile: N/A 
#> metadata(0):

Contacts() works with .mcool files as well: in this case, the user needs to specify the resolution at which count values are recovered.

mcool_file <- HiContactsData('yeast_wt', format = 'mcool')
#> snapshotDate(): 2022-10-24
#> see ?HiContactsData and browseVignettes('HiContactsData') for documentation
#> loading from cache
range <- 'II:0-800000' # range of interest
lsCoolResolutions(mcool_file, verbose = TRUE)
#> resolutions(5): 1000 2000 4000 8000 16000
#> 
# contacts <- Contacts(mcool_file, focus = range) # This would throw an error!
contacts <- Contacts(mcool_file, focus = range, res = 1000)
contacts
#> `Contacts` object with 73,257 interactions over 790 regions 
#> -------
#> fileName: "/home/biocbuild/.cache/R/ExperimentHub/37cabfdcee0b5_7752" 
#> focus: "II:0-800,000" 
#> resolutions(5): 1000 2000 4000 8000 16000
#> current resolution: 1000 
#> interactions: 73257 
#> scores(2): raw balanced 
#> topologicalFeatures: loops(0) borders(0) compartments(0) viewpoints(0) 
#> pairsFile: N/A 
#> metadata(0):

One can also extract a count matrix from a .(m)cool file that is not centered at the diagonal. To do this, specify a couple of coordinates in the focus argument using a character string formatted as "... x ...":

contacts <- Contacts(mcool_file, focus = 'II:0-500000 x II:100000-600000', res = 1000)
is_symmetrical(contacts)
#> [1] FALSE

1.3 Slots

Slots for a Contacts object can be accessed using the following getters:

fileName(contacts)
#> [1] "/home/biocbuild/.cache/R/ExperimentHub/37cabfdcee0b5_7752"
focus(contacts)
#> [1] "II:0-500000 x II:100000-600000"
resolutions(contacts)
#> [1]  1000  2000  4000  8000 16000
resolution(contacts)
#> [1] 1000
interactions(contacts)
#> GInteractions object with 41468 interactions and 2 metadata columns:
#>           seqnames1       ranges1     seqnames2       ranges2 |   bin_id1
#>               <Rle>     <IRanges>         <Rle>     <IRanges> | <numeric>
#>       [1]        II        1-1000 ---        II 105001-106000 |       232
#>       [2]        II        1-1000 ---        II 119001-120000 |       232
#>       [3]        II        1-1000 ---        II 126001-127000 |       232
#>       [4]        II        1-1000 ---        II 136001-137000 |       232
#>       [5]        II        1-1000 ---        II 255001-256000 |       232
#>       ...       ...           ... ...       ...           ... .       ...
#>   [41464]        II 498001-499000 ---        II 586001-587000 |       730
#>   [41465]        II 498001-499000 ---        II 587001-588000 |       730
#>   [41466]        II 498001-499000 ---        II 591001-592000 |       730
#>   [41467]        II 498001-499000 ---        II 594001-595000 |       730
#>   [41468]        II 499001-500000 ---        II 499001-500000 |       731
#>             bin_id2
#>           <numeric>
#>       [1]       337
#>       [2]       351
#>       [3]       358
#>       [4]       368
#>       [5]       487
#>       ...       ...
#>   [41464]       818
#>   [41465]       819
#>   [41466]       823
#>   [41467]       826
#>   [41468]       731
#>   -------
#>   regions: 588 ranges and 3 metadata columns
#>   seqinfo: 16 sequences from an unspecified genome
scores(contacts)
#> List of length 2
#> names(2): raw balanced
tail(scores(contacts, 1))
#> [1]  1  1  2  1  1 31
tail(scores(contacts, 'balanced'))
#> [1] 0.0006993195 0.0007046892 0.0012934069 0.0017022105 0.0008050573
#> [6] 0.0141594924
topologicalFeatures(contacts)
#> List of length 4
#> names(4): loops borders compartments viewpoints
pairsFile(contacts)
#> NULL
metadata(contacts)
#> list()

Several extra functions are available as well:

seqinfo(contacts) ## To recover the `Seqinfo` object from the `.(m)cool` file
#> Seqinfo object with 16 sequences from an unspecified genome:
#>   seqnames seqlengths isCircular genome
#>   I            230218       <NA>   <NA>
#>   II           813184       <NA>   <NA>
#>   III          316620       <NA>   <NA>
#>   IV          1531933       <NA>   <NA>
#>   V            576874       <NA>   <NA>
#>   ...             ...        ...    ...
#>   XII         1078177       <NA>   <NA>
#>   XIII         924431       <NA>   <NA>
#>   XIV          784333       <NA>   <NA>
#>   XV          1091291       <NA>   <NA>
#>   XVI          948066       <NA>   <NA>
bins(contacts) ## To bin the genome at the current resolution
#> GRanges object with 12079 ranges and 1 metadata column:
#>                     seqnames        ranges strand |    bin_id
#>                        <Rle>     <IRanges>  <Rle> | <integer>
#>            I_1_1000        I        1-1000      * |         1
#>         I_1001_2000        I     1001-2000      * |         2
#>         I_2001_3000        I     2001-3000      * |         3
#>         I_3001_4000        I     3001-4000      * |         4
#>         I_4001_5000        I     4001-5000      * |         5
#>                 ...      ...           ...    ... .       ...
#>   XVI_944001_945000      XVI 944001-945000      * |     12075
#>   XVI_945001_946000      XVI 945001-946000      * |     12076
#>   XVI_946001_947000      XVI 946001-947000      * |     12077
#>   XVI_947001_948000      XVI 947001-948000      * |     12078
#>   XVI_948001_948066      XVI 948001-948066      * |     12079
#>   -------
#>   seqinfo: 16 sequences from an unspecified genome
regions(contacts) ## To extract unique regions of the contact matrix
#> GRanges object with 588 ranges and 3 metadata columns:
#>                    seqnames        ranges strand |   chr    center    bin_id
#>                       <Rle>     <IRanges>  <Rle> | <Rle> <integer> <integer>
#>          II_1_1000       II        1-1000      * |    II       500       232
#>       II_1001_2000       II     1001-2000      * |    II      1500       233
#>       II_5001_6000       II     5001-6000      * |    II      5500       237
#>       II_6001_7000       II     6001-7000      * |    II      6500       238
#>       II_7001_8000       II     7001-8000      * |    II      7500       239
#>                ...      ...           ...    ... .   ...       ...       ...
#>   II_593001_594000       II 593001-594000      * |    II    593500       825
#>   II_594001_595000       II 594001-595000      * |    II    594500       826
#>   II_595001_596000       II 595001-596000      * |    II    595500       827
#>   II_596001_597000       II 596001-597000      * |    II    596500       828
#>   II_597001_598000       II 597001-598000      * |    II    597500       829
#>   -------
#>   seqinfo: 16 sequences from an unspecified genome
anchors(contacts) ## To extract "first" and "second" anchors for each interaction
#> $first
#> GRanges object with 41468 ranges and 3 metadata columns:
#>           seqnames        ranges strand |   chr    center    bin_id
#>              <Rle>     <IRanges>  <Rle> | <Rle> <integer> <integer>
#>       [1]       II        1-1000      * |    II       500       232
#>       [2]       II        1-1000      * |    II       500       232
#>       [3]       II        1-1000      * |    II       500       232
#>       [4]       II        1-1000      * |    II       500       232
#>       [5]       II        1-1000      * |    II       500       232
#>       ...      ...           ...    ... .   ...       ...       ...
#>   [41464]       II 498001-499000      * |    II    498500       730
#>   [41465]       II 498001-499000      * |    II    498500       730
#>   [41466]       II 498001-499000      * |    II    498500       730
#>   [41467]       II 498001-499000      * |    II    498500       730
#>   [41468]       II 499001-500000      * |    II    499500       731
#>   -------
#>   seqinfo: 16 sequences from an unspecified genome
#> 
#> $second
#> GRanges object with 41468 ranges and 3 metadata columns:
#>           seqnames        ranges strand |   chr    center    bin_id
#>              <Rle>     <IRanges>  <Rle> | <Rle> <integer> <integer>
#>       [1]       II 105001-106000      * |    II    105500       337
#>       [2]       II 119001-120000      * |    II    119500       351
#>       [3]       II 126001-127000      * |    II    126500       358
#>       [4]       II 136001-137000      * |    II    136500       368
#>       [5]       II 255001-256000      * |    II    255500       487
#>       ...      ...           ...    ... .   ...       ...       ...
#>   [41464]       II 586001-587000      * |    II    586500       818
#>   [41465]       II 587001-588000      * |    II    587500       819
#>   [41466]       II 591001-592000      * |    II    591500       823
#>   [41467]       II 594001-595000      * |    II    594500       826
#>   [41468]       II 499001-500000      * |    II    499500       731
#>   -------
#>   seqinfo: 16 sequences from an unspecified genome

1.4 Slot setters

1.4.1 Scores

Add any scores metric using a numerical vector.

scores(contacts, 'random') <- runif(length(contacts))
scores(contacts)
#> List of length 3
#> names(3): raw balanced random
tail(scores(contacts, 'random'))
#> [1] 0.4665496 0.8316451 0.5366689 0.2861325 0.7997940 0.2052648

1.4.2 Features

Add topologicalFeatures using GRanges or Pairs.

topologicalFeatures(contacts, 'viewpoints') <- GRanges("II:300000-320000")
topologicalFeatures(contacts)
#> List of length 4
#> names(4): loops borders compartments viewpoints
topologicalFeatures(contacts, 'viewpoints')
#> GRanges object with 1 range and 0 metadata columns:
#>       seqnames        ranges strand
#>          <Rle>     <IRanges>  <Rle>
#>   [1]       II 300000-320000      *
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths

1.5 Coercing Contacts

Using the as() function, Contacts can be coerced in GInteractions, ContactMatrix and matrix seamlessly.

as(contacts, "GInteractions")
#> GInteractions object with 41468 interactions and 2 metadata columns:
#>           seqnames1       ranges1     seqnames2       ranges2 |   bin_id1
#>               <Rle>     <IRanges>         <Rle>     <IRanges> | <numeric>
#>       [1]        II        1-1000 ---        II 105001-106000 |       232
#>       [2]        II        1-1000 ---        II 119001-120000 |       232
#>       [3]        II        1-1000 ---        II 126001-127000 |       232
#>       [4]        II        1-1000 ---        II 136001-137000 |       232
#>       [5]        II        1-1000 ---        II 255001-256000 |       232
#>       ...       ...           ... ...       ...           ... .       ...
#>   [41464]        II 498001-499000 ---        II 586001-587000 |       730
#>   [41465]        II 498001-499000 ---        II 587001-588000 |       730
#>   [41466]        II 498001-499000 ---        II 591001-592000 |       730
#>   [41467]        II 498001-499000 ---        II 594001-595000 |       730
#>   [41468]        II 499001-500000 ---        II 499001-500000 |       731
#>             bin_id2
#>           <numeric>
#>       [1]       337
#>       [2]       351
#>       [3]       358
#>       [4]       368
#>       [5]       487
#>       ...       ...
#>   [41464]       818
#>   [41465]       819
#>   [41466]       823
#>   [41467]       826
#>   [41468]       731
#>   -------
#>   regions: 588 ranges and 3 metadata columns
#>   seqinfo: 16 sequences from an unspecified genome
as(contacts, "ContactMatrix")
#> class: ContactMatrix 
#> dim: 588 588 
#> type: matrix 
#> rownames: NULL
#> colnames: NULL
#> metadata(0):
#> regions: 588
as(contacts, "matrix")[1:10, 1:10]
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#>  [1,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
#>  [2,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
#>  [3,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
#>  [4,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
#>  [5,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
#>  [6,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
#>  [7,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
#>  [8,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
#>  [9,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
#> [10,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA

2 Plotting matrices

2.1 Plot matrix heatmaps

The plotMatrix function takes a Contacts object and plots it as a heatmap.
Use the use.scores argument to specify which type of interaction scores to use in the contact maps (e.g. raw, balanced, …). By default, plotMatrix() looks for balanced scores. If they are not stored in the original .(m)/cool file, plotMatrix() simply takes the first scores available.

plotMatrix(contacts, use.scores = 'balanced', limits = c(-4, -1), dpi = 120)