Enriched heatmap is a special type of heatmap which visualizes the enrichment of genomic signals over specific target regions. It is broadly used to visualize e.g. how histone modifications are enriched at transcription start sites.

There are already several tools that are able to make such heatmap (e.g. ngs.plot, deepTools or genomation). Here we implement enriched heatmap by ComplexHeatmap package. Since the enriched heatmap is essentially a normal heatmap but with some special settings, with the functionality of ComplexHeatmap, it would be much easier to customize the heatmaps as well as concatenating to a list of heatmaps to show correspondance between different data sources.

Basic usages

library(EnrichedHeatmap)

First let’s load the example dataset that we will use for demonstration. The data for human lung tissue is from Roadmap dataset.

set.seed(123)
load(system.file("extdata", "chr21_test_data.RData", package = "EnrichedHeatmap"))
ls()
## [1] "H3K4me3" "cgi"     "genes"   "meth"    "rpkm"

There are following R objects:

  • H3K4me3: coverage for H3K4me3 histone modification from the ChIP-seq data,
  • cgi: CpG islands,
  • genes: genes,
  • meth: methylation for CpG sites from WGBS,
  • rpkm: gene expression from RNASeq.

In order to build the vignette fast, the data only includes chromosome 21. Also we downsampled 100000 CpG sites for methylation data.

We first visualize how H3K4me3 histone modification is enriched around gene TSS. We extract TSS of genes (note tss has strand information):

tss = promoters(genes, upstream = 0, downstream = 1)
tss[1:5]
## GRanges object with 5 ranges and 0 metadata columns:
##                      seqnames    ranges strand
##                         <Rle> <IRanges>  <Rle>
##    ENSG00000141956.9    chr21  43299591      -
##   ENSG00000141959.12    chr21  45719934      +
##    ENSG00000142149.4    chr21  33245628      +
##   ENSG00000142156.10    chr21  47401651      +
##    ENSG00000142166.8    chr21  34696734      +
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths
H3K4me3[1:5]
## GRanges object with 5 ranges and 1 metadata column:
##       seqnames          ranges strand |  coverage
##          <Rle>       <IRanges>  <Rle> | <integer>
##   [1]    chr21 9825468-9825470      * |        10
##   [2]    chr21 9825471-9825488      * |        13
##   [3]    chr21         9825489      * |        12
##   [4]    chr21         9825490      * |        13
##   [5]    chr21 9825491-9825493      * |        14
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

Similar as other tools, the task of visualization are separated into two steps:

  1. get association between genomic signals and targets by normalizing into a matrix.
  2. visualize the matrix by heatmap.
mat1 = normalizeToMatrix(H3K4me3, tss, value_column = "coverage", 
    extend = 5000, mean_mode = "w0", w = 50)
mat1
## Normalize H3K4me3 to tss:
##   Upstream 5000 bp (100 windows)
##   Downstream 5000 bp (100 windows)
##   Include target regions (width = 1)
##   720 target regions
class(mat1)
## [1] "normalizedMatrix" "matrix"

normalizeToMatrix() converts the association between genomic signals (H3K4me3) and targets(tss) into a matrix (actually mat1 is just a normal matrix with several additional attributes). It first splits the extended targets regions (the extension to upstream and downstream is controlled by extend argument) into a list of small windows (the width of the windows is controlled by w), then overlaps genomic signals to these small windows and calculates the value for every small window which is the mean value of genomic signals that intersects with the window (the value corresponds to genomic signals are controlled by value_column and how to calcualte the mean value is controlled by mean_mode.).

There are several modes for mean_mode according to different types of genomic signals. It will be explained in later sections.

With mat1, we can visualize it as a heatmap:

EnrichedHeatmap(mat1, name = "H3K4me3")

By default, rows are ordered according to the enrichment to the target regions. On top of the heatmap there is a specific type of annotation which summarize the enrichment patterns as a line plot, implemented b anno_enriched().

EnrichedHeatmap() internally calls Heatmap() and returns a Heatmap class object, so parameters for Heatmap() can be directly applied to EnrichedHeatmap(). Users can go to the ComplexHeatmap package to get a more comprehensive help.

Colors

Similar as the normal heatmap, the simplest way to set colors is to provide a vector of colors.

EnrichedHeatmap(mat1, col = c("white", "red"), name = "H3K4me3")

You may wonder why the color looks so light. The reason is in coverage values in H3K4me3, there exist some extreme values, which results in extreme value in mat1.

quantile(H3K4me3$coverage, c(0, 0.25, 0.5, 0.75, 0.99, 1))
##   0%  25%  50%  75%  99% 100% 
##   10   18   29   43   87  293
quantile(mat1, c(0, 0.25, 0.5, 0.75, 0.99, 1))
##        0%       25%       50%       75%       99%      100% 
##   0.00000   0.00000   0.00000   0.00000  38.92176 174.78000

If a vector of colors is specified, sequential values from minimal to maximal are mapped to the colors, and other values are linearly interpolated. To get rid of such extreme values, there are two ways. The first is to specify keep option which trims extreme values both at lower and upper bounds. (In following, it means only to trim values larger than 99th percentile.)

mat1_trim = normalizeToMatrix(H3K4me3, tss, value_column = "coverage", 
    extend = 5000, mean_mode = "w0", w = 50, keep = c(0, 0.99))
EnrichedHeatmap(mat1_trim, col = c("white", "red"), name = "H3K4me3")