The aim of HiCDOC is to detect significant A/B compartment changes, using Hi-C data with replicates.
HiCDOC normalizes intrachromosomal Hi-C matrices, uses unsupervised learning to predict A/B compartments from multiple replicates, and detects significant compartment changes between experiment conditions.
It provides a collection of functions assembled into a pipeline:
HiCDOC can be installed from Bioconductor:
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("HiCDOC")
The package can then be loaded:
library(HiCDOC)
#> Loading required package: InteractionSet
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#> colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#> get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
#> match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#> Position, rank, rbind, Reduce, rownames, sapply, setdiff, table,
#> tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#>
#> findMatches
#> The following objects are masked from 'package:base':
#>
#> expand.grid, I, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#>
#> rowMedians
#> The following objects are masked from 'package:matrixStats':
#>
#> anyMissing, rowMedians
HiCDOC can import Hi-C data sets in various different formats:
- Tabular .tsv
files.
- Cooler .cool
or .mcool
files.
- Juicer .hic
files.
- HiC-Pro .matrix
and .bed
files.
A tabular file is a tab-separated multi-replicate sparse matrix with a header:
chromosome position 1 position 2 C1.R1 C1.R2 C2.R1 … Y 1500000 7500000 145 184 72 … …
The number of interactions between position 1
and position 2
of
chromosome
are reported in each condition.replicate
column. There is no
limit to the number of conditions and replicates.
To load Hi-C data in this format:
hic.experiment <- HiCDOCDataSetFromTabular('path/to/data.tsv')
To load .cool
or .mcool
files generated by Cooler
(Abdennur and Mirny 2019):
# Path to each file
paths = c(
'path/to/condition-1.replicate-1.cool',
'path/to/condition-1.replicate-2.cool',
'path/to/condition-2.replicate-1.cool',
'path/to/condition-2.replicate-2.cool',
'path/to/condition-3.replicate-1.cool'
)
# Replicate and condition of each file. Can be names instead of numbers.
replicates <- c(1, 2, 1, 2, 1)
conditions <- c(1, 1, 2, 2, 3)
# Resolution to select in .mcool files
binSize = 500000
# Instantiation of data set
hic.experiment <- HiCDOCDataSetFromCool(
paths,
replicates = replicates,
conditions = conditions,
binSize = binSize # Specified for .mcool files.
)
To load .hic
files generated by Juicer (Durand 2016):
# Path to each file
paths = c(
'path/to/condition-1.replicate-1.hic',
'path/to/condition-1.replicate-2.hic',
'path/to/condition-2.replicate-1.hic',
'path/to/condition-2.replicate-2.hic',
'path/to/condition-3.replicate-1.hic'
)
# Replicate and condition of each file. Can be names instead of numbers.
replicates <- c(1, 2, 1, 2, 1)
conditions <- c(1, 1, 2, 2, 3)
# Resolution to select
binSize <- 500000
# Instantiation of data set
hic.experiment <- HiCDOCDataSetFromHiC(
paths,
replicates = replicates,
conditions = conditions,
binSize = binSize
)
To load .matrix
and .bed
files generated by HiC-Pro
(Servant 2015):
# Path to each matrix file
matrixPaths = c(
'path/to/condition-1.replicate-1.matrix',
'path/to/condition-1.replicate-2.matrix',
'path/to/condition-2.replicate-1.matrix',
'path/to/condition-2.replicate-2.matrix',
'path/to/condition-3.replicate-1.matrix'
)
# Path to each bed file
bedPaths = c(
'path/to/condition-1.replicate-1.bed',
'path/to/condition-1.replicate-2.bed',
'path/to/condition-2.replicate-1.bed',
'path/to/condition-2.replicate-2.bed',
'path/to/condition-3.replicate-1.bed'
)
# Replicate and condition of each file. Can be names instead of numbers.
replicates <- c(1, 2, 1, 2, 1)
conditions <- c(1, 1, 2, 2, 3)
# Instantiation of data set
hic.experiment <- HiCDOCDataSetFromHiCPro(
matrixPaths = matrixPaths,
bedPaths = bedPaths,
replicates = replicates,
conditions = conditions
)
An example dataset can be loaded from the HiCDOC package:
data(exampleHiCDOCDataSet)
Once your data is loaded, you can run all the filtering, normalization, and
prediction steps with the command : HiCDOC(exampleHiCDOCDataSet)
.
This one-liner runs all the steps detailed below.
Remove small chromosomes of length smaller than 100 positions (100 is the default value):
hic.experiment <- filterSmallChromosomes(exampleHiCDOCDataSet, threshold = 100)
#> Keeping chromosomes with at least 100 positions.
#> Kept 3 chromosomes: X, Y, Z
#> Removed 1 chromosome: W
Remove sparse replicates filled with less than 30% non-zero interactions (30% is the default value):
hic.experiment <- filterSparseReplicates(hic.experiment, threshold = 0.3)
#> Keeping replicates filled with at least 30% non-zero interactions.
#>
#> Removed interactions matrix of chromosome X, condition 1, replicate R2 filled at 2.347%.
#> Removed 1 replicate in total.
Remove weak positions with less than 1 interaction in average (1 is the default value):
hic.experiment <- filterWeakPositions(hic.experiment, threshold = 1)
#> Keeping positions with interactions average greater or equal to 1.
#> Chromosome X: 2 positions removed, 118 positions remaining.
#> Chromosome Y: 3 positions removed, 157 positions remaining.
#> Chromosome Z: 0 positions removed, 200 positions remaining.
#> Removed 5 positions in total.
Normalize technical biases such as sequencing depth (inter-matrix normalization) so that matrices are comparable :
suppressWarnings(hic.experiment <- normalizeTechnicalBiases(hic.experiment))
#> Normalizing technical biases.
This normalization uses uses cyclic loess normalization from [multiHiCcompare package] (Stansfield, Cresswell, and Dozmorov 2019).
Note : For large dataset, it is highly recommended to set a value for
cycleLoessSpan
parameter to reduce computing time and necessary memory. See
?HiCDOC::normalizeTechnicalBiases
Normalize biological biases, such as GC content, number of restriction sites, etc. (intra-matrix normalization):
hic.experiment <- normalizeBiologicalBiases(hic.experiment)
#> Chromosome X: normalizing biological biases.
#> Chromosome Y: normalizing biological biases.
#> Chromosome Z: normalizing biological biases.
Normalize the linear distance effect resulting from more interactions between
closer genomic regions (20000 is the default value for loessSampleSize
):
hic.experiment <-
normalizeDistanceEffect(hic.experiment, loessSampleSize = 20000)
#> Chromosome X: normalizing distance effect.
#> Chromosome Y: normalizing distance effect.
#> Chromosome Z: normalizing distance effect.
Predict A and B compartments and detect significant differences, here using the default values as parameters:
hic.experiment <- detectCompartments(hic.experiment)
#> Clustering genomic positions.
#> Predicting A/B compartments.
#> Detecting significant differences.
Plot the interaction matrix of each replicate:
p <- plotInteractions(hic.experiment, chromosome = "Y")
p