In this guide, we illustrate here two common downstream analysis workflows for ChIP-seq experiments, one is for comparing and combining peaks for single transcription factor (TF) with replicates, and the other is for comparing binding profiles from ChIP-seq experiments with multiple TFs.
This workflow shows how to convert BED/GFF files to GRanges, find overlapping peaks between two peak sets, and visualize the number of common and specific peaks with Venn diagram.
The input for ChIPpeakAnno2 is a list of called peaks
identified from ChIP-seq experiments or any other experiments that yield
a set of chromosome coordinates. Although peaks are represented as
GRanges in ChIPpeakAnno, other common peak formats such
as BED, GFF and MACS can be converted to GRanges easily using a
conversion toGRanges
method. For detailed information on
how to use this method, please type ?toGRanges
.
The following examples illustrate the usage of this method to convert
BED and GFF file to GRanges, add metadata from orignal peaks to the
overlap GRanges using function addMetadata
, and visualize
the overlapping using function makeVennDiagram
.
library(ChIPpeakAnno)
bed <- system.file("extdata", "MACS_output.bed", package="ChIPpeakAnno")
gr1 <- toGRanges(bed, format="BED", header=FALSE)
## one can also try import from rtracklayer
gff <- system.file("extdata", "GFF_peaks.gff", package="ChIPpeakAnno")
gr2 <- toGRanges(gff, format="GFF", header=FALSE, skip=3)
## must keep the class exactly same as gr1$score, i.e., numeric.
gr2$score <- as.numeric(gr2$score)
ol <- findOverlapsOfPeaks(gr1, gr2)
## add metadata (mean of score) to the overlapping peaks
ol <- addMetadata(ol, colNames="score", FUN=mean)
ol$peaklist[["gr1///gr2"]][1:2]
## GRanges object with 2 ranges and 2 metadata columns:
## seqnames ranges strand | peakNames
## <Rle> <IRanges> <Rle> | <CharacterList>
## [1] chr1 713791-715578 * | gr1__MACS_peak_13,gr2__001,gr2__002
## [2] chr1 724851-727191 * | gr2__003,gr1__MACS_peak_14
## score
## <numeric>
## [1] 850.203
## [2] 29.170
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
makeVennDiagram(ol, fill=c("#009E73", "#F0E442"), # circle fill color
col=c("#D55E00", "#0072B2"), #circle border color
cat.col=c("#D55E00", "#0072B2")) # label color, keep same as circle border color
Venn diagram of overlaps for replicated experiments
## $p.value
## gr1 gr2 pval
## [1,] 1 1 0
##
## $vennCounts
## gr1 gr2 Counts count.gr1 count.gr2
## [1,] 0 0 0 0 0
## [2,] 0 1 61 0 61
## [3,] 1 0 62 62 0
## [4,] 1 1 166 168 169
## attr(,"class")
## [1] "VennCounts"
Annotation data should be an object of GRanges. Same as import peaks,
we use the method toGRanges
, which can return an object of
GRanges, to represent the annotation data. An annotation data be
constructed from not only BED, GFF or user defined readable text files,
but also EnsDb or TxDb object, by calling the toGRanges
method. Please type ?toGRanges
for more information.
Note that the version of the annotation data must match with the genome used for mapping because the coordinates may differ for different genome releases. For example, if you are using Mus_musculus.v103 for mapping, you’d best also use EnsDb.Mmusculus.v103 for annotation. For more information about how to prepare the annotation data, please refer ?getAnnotation.
library(EnsDb.Hsapiens.v75) ##(hg19)
## create annotation file from EnsDb or TxDb
annoData <- toGRanges(EnsDb.Hsapiens.v75, feature="gene")
annoData[1:2]
## GRanges object with 2 ranges and 1 metadata column:
## seqnames ranges strand | gene_name
## <Rle> <IRanges> <Rle> | <character>
## ENSG00000223972 chr1 11869-14412 + | DDX11L1
## ENSG00000227232 chr1 14363-29806 - | WASH7P
## -------
## seqinfo: 273 sequences (1 circular) from 2 genomes (hg19, GRCh37)
After finding the overlapping peaks, the distribution of the distance
of overlapped peaks to the nearest feature such as the transcription
start sites (TSS) can be plotted by binOverFeature
function. The sample code here plots the distribution of peaks around
the TSS.
overlaps <- ol$peaklist[["gr1///gr2"]]
binOverFeature(overlaps, annotationData=annoData,
radius=5000, nbins=20, FUN=length, errFun=0,
xlab="distance from TSS (bp)", ylab="count",
main="Distribution of aggregated peak numbers around TSS")
Distribution of peaks around transcript start sites.
In addition, genomicElementDistribution
can be used to
summarize the distribution of peaks over different type of features such
as exon, intron, enhancer, proximal promoter, 5’ UTR and 3’ UTR. This
distribution can be summarized in peak centric or nucleotide centric
view using the function genomicElementDistribution
. Please
note that one peak might span multiple type of features, leading to the
number of annotated features greater than the total number of input
peaks. At the peak centric view, precedence will dictate the annotation
order when peaks span multiple type of features.
## check the genomic element distribution of the duplicates
## the genomic element distribution will indicates the
## the correlation between duplicates.
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
peaks <- GRangesList(rep1=gr1,
rep2=gr2)
genomicElementDistribution(peaks,
TxDb = TxDb.Hsapiens.UCSC.hg19.knownGene,
promoterRegion=c(upstream=2000, downstream=500),
geneDownstream=c(upstream=0, downstream=2000))
Peak distribution over different genomic features.
## check the genomic element distribution for the overlaps
## the genomic element distribution will indicates the
## the best methods for annotation.
## The percentages in the legend show the percentage of peaks in
## each category.
out <- genomicElementDistribution(overlaps,
TxDb = TxDb.Hsapiens.UCSC.hg19.knownGene,
promoterRegion=c(upstream=2000, downstream=500),
geneDownstream=c(upstream=0, downstream=2000),
promoterLevel=list(
# from 5' -> 3', fixed precedence 3' -> 5'
breaks = c(-2000, -1000, -500, 0, 500),
labels = c("upstream 1-2Kb", "upstream 0.5-1Kb",
"upstream <500b", "TSS - 500b"),
colors = c("#FFE5CC", "#FFCA99",
"#FFAD65", "#FF8E32")))
Peak distribution over different genomic features.
## check the genomic element distribution by upset plot.
## by function genomicElementUpSetR, no precedence will be considered.
library(UpSetR)
x <- genomicElementUpSetR(overlaps,
TxDb.Hsapiens.UCSC.hg19.knownGene)
upset(x$plotData, nsets=13, nintersects=NA)
Peak distribution over different genomic features.
Metagene plot may also provide information for annotation.
metagenePlot(peaks, TxDb.Hsapiens.UCSC.hg19.knownGene)
As shown from the distribution of aggregated peak numbers around TSS
and the distribution of peaks in different of chromosome regions, most
of the peaks locate around TSS. Therefore, it is reasonable to use
annotatePeakInBatch
or annoPeaks
to annotate
the peaks to the promoter regions of Hg19 genes. Promoters can be
specified with bindingRegion. For the following example, promoter region
is defined as upstream 2000 and downstream 500 from TSS
(bindingRegion=c(-2000, 500)).
overlaps.anno <- annotatePeakInBatch(overlaps,
AnnotationData=annoData,
output="nearestBiDirectionalPromoters",
bindingRegion=c(-2000, 500))
library(org.Hs.eg.db)
overlaps.anno <- addGeneIDs(overlaps.anno,
"org.Hs.eg.db",
IDs2Add = "entrez_id")
head(overlaps.anno)
## GRanges object with 6 ranges and 11 metadata columns:
## seqnames ranges strand | peakNames
## <Rle> <IRanges> <Rle> | <CharacterList>
## X1 chr1 713791-715578 * | gr1__MACS_peak_13,gr2__001,gr2__002
## X1 chr1 713791-715578 * | gr1__MACS_peak_13,gr2__001,gr2__002
## X3 chr1 839467-840090 * | gr1__MACS_peak_16,gr2__004
## X4 chr1 856361-856999 * | gr1__MACS_peak_17,gr2__005
## X5 chr1 859315-860144 * | gr2__006,gr1__MACS_peak_18
## X10 chr1 901118-902861 * | gr2__012,gr1__MACS_peak_23
## score peak feature feature.ranges feature.strand
## <numeric> <character> <character> <IRanges> <Rle>
## X1 850.203 X1 ENSG00000228327 700237-714006 -
## X1 850.203 X1 ENSG00000237491 714150-745440 +
## X3 73.120 X3 ENSG00000272438 840214-851356 +
## X4 54.690 X4 ENSG00000223764 852245-856396 -
## X5 81.485 X5 ENSG00000187634 860260-879955 +
## X10 119.910 X10 ENSG00000187583 901877-911245 +
## distance insideFeature distanceToSite gene_name entrez_id
## <integer> <character> <integer> <character> <character>
## X1 0 overlapStart 0 RP11-206L10.2 <NA>
## X1 0 overlapStart 0 RP11-206L10.9 105378580
## X3 123 upstream 123 RP11-54O7.16 <NA>
## X4 0 overlapStart 0 RP11-54O7.3 100130417
## X5 115 upstream 115 SAMD11 148398
## X10 0 overlapStart 0 PLEKHN1 84069
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
ol.anno.out <- unname(overlaps.anno)
ol.anno.out$peakNames <- NULL # remove the CharacterList to avoid error message.
write.csv(as.data.frame(ol.anno.out), "anno.csv")
The distribution of the common peaks around features can be visualized using a pie chart.
pie1(table(overlaps.anno$insideFeature))
Pie chart of the distribution of common peaks around features.
The following example shows how to use getEnrichedGO
to
obtain a list of enriched GO terms with annotated peaks. For pathway
analysis, please use function getEnrichedPATH
with reactome
or KEGG database. Please note that by default feature_id_type
is set as “ensembl_gene_id”. If you are using TxDb as
annotation data, please set it to “entrez_id”.
over <- getEnrichedGO(overlaps.anno, orgAnn="org.Hs.eg.db", condense=TRUE)
enrichmentPlot(over)
library(reactome.db)
path <- getEnrichedPATH(overlaps.anno, "org.Hs.eg.db", "reactome.db", maxP=.05)
enrichmentPlot(path)
Here is an example to get the sequences of the peaks plus 20 bp upstream and downstream for PCR validation or motif discovery.
library(BSgenome.Hsapiens.UCSC.hg19)
seq <- getAllPeakSequence(overlaps, upstream=20, downstream=20, genome=Hsapiens)
write2FASTA(seq, "test.fa")
Here is an example to get the Z-scores for short oligos4.
## summary of the short oligos
library(seqinr)
freqs <- oligoFrequency(Hsapiens$chr1, MarkovOrder=3)
os <- oligoSummary(seq, oligoLength=6, MarkovOrder=3,
quickMotif=FALSE, freqs=freqs)
## plot the results
zscore <- sort(os$zscore)
h <- hist(zscore, breaks=100, main="Histogram of Z-score")
text(zscore[length(zscore)], h$counts[length(h$counts)]+1,
labels=names(zscore[length(zscore)]), adj=0, srt=90)
Histogram of Z-score of 6-mer
## We can also try simulation data
seq.sim.motif <- list(c("t", "g", "c", "a", "t", "g"),
c("g", "c", "a", "t", "g", "c"))
set.seed(1)
seq.sim <- sapply(sample(c(2, 1, 0), 1000, replace=TRUE, prob=c(0.07, 0.1, 0.83)),
function(x){
s <- sample(c("a", "c", "g", "t"),
sample(100:1000, 1), replace=TRUE)
if(x>0){
si <- sample.int(length(s), 1)
if(si>length(s)-6) si <- length(s)-6
s[si:(si+5)] <- seq.sim.motif[[x]]
}
paste(s, collapse="")
})
os <- oligoSummary(seq.sim, oligoLength=6, MarkovOrder=3,
quickMotif=TRUE)
zscore <- sort(os$zscore, decreasing=TRUE)
h <- hist(zscore, breaks=100, main="Histogram of Z-score")
text(zscore[1:2], rep(5, 2),
labels=names(zscore[1:2]), adj=0, srt=90)
Histogram of Z-score of simulation data
## generate the motifs
library(motifStack)
pfms <- mapply(function(.ele, id)
new("pfm", mat=.ele, name=paste("SAMPLE motif", id)),
os$motifs, 1:length(os$motifs))
motifStack(pfms[[1]])
Motif of simulation data
Bidirectional promoters are the DNA regions located between TSS of two adjacent genes that are transcribed on opposite directions and often co-regulated by this shared promoter region5. Here is an example to find peaks near bi-directional promoters.
bdp <- peaksNearBDP(overlaps, annoData, maxgap=5000)
c(bdp$percentPeaksWithBDP,
bdp$n.peaks,
bdp$n.peaksWithBDP)
## [1] 0.1084337 166.0000000 18.0000000
bdp$peaksWithBDP[1:2]
## GRangesList object of length 2:
## $`1`
## GRanges object with 2 ranges and 11 metadata columns:
## seqnames ranges strand | peakNames
## <Rle> <IRanges> <Rle> | <CharacterList>
## X1 chr1 713791-715578 * | gr1__MACS_peak_13,gr2__001,gr2__002
## X1 chr1 713791-715578 * | gr1__MACS_peak_13,gr2__001,gr2__002
## score bdp_idx peak feature feature.ranges
## <numeric> <integer> <character> <character> <IRanges>
## X1 850.203 1 X1 ENSG00000228327 700237-714006
## X1 850.203 1 X1 ENSG00000237491 714150-745440
## feature.strand distance insideFeature distanceToSite gene_name
## <Rle> <integer> <character> <integer> <character>
## X1 - 0 overlapStart 0 RP11-206L10.2
## X1 + 0 overlapStart 0 RP11-206L10.9
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
##
## $`4`
## GRanges object with 2 ranges and 11 metadata columns:
## seqnames ranges strand | peakNames score
## <Rle> <IRanges> <Rle> | <CharacterList> <numeric>
## X4 chr1 856361-856999 * | gr1__MACS_peak_17,gr2__005 54.69
## X4 chr1 856361-856999 * | gr1__MACS_peak_17,gr2__005 54.69
## bdp_idx peak feature feature.ranges feature.strand
## <integer> <character> <character> <IRanges> <Rle>
## X4 4 X4 ENSG00000223764 852245-856396 -
## X4 4 X4 ENSG00000187634 860260-879955 +
## distance insideFeature distanceToSite gene_name
## <integer> <character> <integer> <character>
## X4 0 overlapStart 0 RP11-54O7.3
## X4 3260 upstream 3260 SAMD11
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
There are several techniques available to determine the spatial organization of chromosomes at high resolution such as 3C, 5C and HiC6. These techniques make it possible to search peaks binding to the potential enhancer regions. Here is an example to find peaks binding to the potential enhancer regions.
DNA5C <- system.file("extdata",
"wgEncodeUmassDekker5CGm12878PkV2.bed.gz",
package="ChIPpeakAnno")
DNAinteractiveData <- toGRanges(gzfile(DNA5C))
findEnhancers(overlaps, annoData, DNAinteractiveData)
## GRanges object with 5 ranges and 15 metadata columns:
## seqnames ranges strand | peakNames
## <Rle> <IRanges> <Rle> | <CharacterList>
## X1 chr1 151591700-151591800 * | gr2__229,gr1__MACS_peak_229
## X1 chr1 151591700-151591800 * | gr2__229,gr1__MACS_peak_229
## X1 chr1 151591700-151591800 * | gr2__229,gr1__MACS_peak_229
## X1 chr1 151591700-151591800 * | gr2__229,gr1__MACS_peak_229
## X1 chr1 151630186-151630286 * | gr2__230,gr1__MACS_peak_230
## score feature feature.ranges feature.strand
## <numeric> <character> <IRanges> <Rle>
## X1 78.675 ENSG00000207606 151518272-151518367 +
## X1 78.675 ENSG00000143390 151313116-151319833 -
## X1 78.675 ENSG00000143376 151584541-151671567 +
## X1 78.675 ENSG00000143367 151512781-151556059 +
## X1 78.675 ENSG00000143393 151264273-151300191 -
## feature.shift.ranges feature.shift.strand distance insideFeature
## <IRanges> <Rle> <integer> <character>
## X1 151594534-151594629 + 2733 upstream
## X1 151595209-151601927 + 3408 upstream
## X1 151500588-151587615 - 4084 upstream
## X1 151595902-151639180 + 4101 upstream
## X1 151594247-151630165 - 20 upstream
## distanceToSite gene_name DNAinteractive.idx peak
## <integer> <character> <integer> <character>
## X1 2733 MIR554 1077 X1
## X1 3408 RFX5 1143 X1
## X1 4084 SNX27 936 X1
## X1 4101 TUFT1 936 X1
## X1 20 PI4KB 814 X1
## DNAinteractive.ranges DNAinteractive.blocks
## <IRanges> <IRangesList>
## X1 151516086-151603110 1-19082,76263-87025
## X1 151309062-151603110 1-13633,283287-294049
## X1 151546428-151636526 1-6978,72324-90099
## X1 151546428-151636526 1-6978,72324-90099
## X1 151283079-151636526 1-5699,335673-353448
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
Given two or more peak lists from different TFs, one may be interested in finding whether DNA binding profile of those TFs are correlated, and if correlated, what is the common binding pattern. The workflow here shows how to test the correlation of binding profiles of three TFs and how to discover the common binding pattern.
path <- system.file("extdata", package="ChIPpeakAnno")
files <- dir(path, "broadPeak")
data <- sapply(file.path(path, files), toGRanges, format="broadPeak")
names(data) <- gsub(".broadPeak", "", files)
When we test the association between two sets of data based on
hypergeometric distribution, the number of all potential binding sites
is required. The parameter totalTest in the function
makeVennDiagram
indicates how many potential peaks in total
will be used in the hypergeometric test. It should be larger than the
largest number of peaks in the peak list. The smaller it is set, the
more stringent the test is. The time used to calculate p-value does not
depend on the value of the totalTest. For practical guidance on
how to choose totalTest, please refer to the post.
The following example makes an assumption that there are 3% of coding
region plus promoter region. Because the sample data is only a subset of
chromosome 2, we estimate that the total binding sites is 1/24 of
possible binding region in the genome.
ol <- findOverlapsOfPeaks(data, connectedPeaks="keepAll")
averagePeakWidth <- mean(width(unlist(GRangesList(ol$peaklist))))
tot <- ceiling(3.3e+9 * .03 / averagePeakWidth / 24)
makeVennDiagram(ol, totalTest=tot, connectedPeaks="keepAll",
fill=c("#CC79A7", "#56B4E9", "#F0E442"), # circle fill color
col=c("#D55E00", "#0072B2", "#E69F00"), #circle border color
cat.col=c("#D55E00", "#0072B2", "#E69F00"))
Venn diagram of overlaps.
## $p.value
## TAF Tead4 YY1 pval
## [1,] 0 1 1 1.000000e+00
## [2,] 1 0 1 2.904297e-258
## [3,] 1 1 0 8.970986e-04
##
## $vennCounts
## TAF Tead4 YY1 Counts count.TAF count.Tead4 count.YY1
## [1,] 0 0 0 849 0 0 0
## [2,] 0 0 1 621 0 0 621
## [3,] 0 1 0 2097 0 2097 0
## [4,] 0 1 1 309 0 310 319
## [5,] 1 0 0 59 59 0 0
## [6,] 1 0 1 166 172 0 170
## [7,] 1 1 0 8 8 8 0
## [8,] 1 1 1 476 545 537 521
## attr(,"class")
## [1] "VennCounts"
## see the difference if we set connectedPeaks to "keepFirstListConsistent"
## set connectedPeaks to keepFirstListConsistent will show consistent total
## number of peaks for the first peak list.
makeVennDiagram(ol, totalTest=tot, connectedPeaks="keepFirstListConsistent",
fill=c("#CC79A7", "#56B4E9", "#F0E442"),
col=c("#D55E00", "#0072B2", "#E69F00"),
cat.col=c("#D55E00", "#0072B2", "#E69F00"))
Venn diagram of overlaps.
## $p.value
## TAF Tead4 YY1 pval
## [1,] 0 1 1 1.000000e+00
## [2,] 1 0 1 2.904297e-258
## [3,] 1 1 0 8.970986e-04
##
## $vennCounts
## TAF Tead4 YY1 Counts count.TAF count.Tead4 count.YY1
## [1,] 0 0 0 849 0 0 0
## [2,] 0 0 1 621 0 0 621
## [3,] 0 1 0 2097 0 2097 0
## [4,] 0 1 1 309 0 310 319
## [5,] 1 0 0 59 59 0 0
## [6,] 1 0 1 166 172 0 170
## [7,] 1 1 0 8 8 8 0
## [8,] 1 1 1 476 545 537 521
## attr(,"class")
## [1] "VennCounts"
The above hypergeometric test requires users to input an estimate of
the total potential binding sites for a given TF. To circumvent this
requirement, we implemented a permutation test called
peakPermTest
. Before performing a permutation test, users
need to generate random peaks using the distribution discovered from the
input peaks for a given feature type (transcripts or exons), to make
sure the binding positions relative to features, such as TSS and
geneEnd, and the width of the random peaks follow the distribution of
that of the input peaks.
Alternatively, a peak pool representing all potential binding sites
can be created with associated binding probabilities for random peak
sampling using preparePool
. Here is an example to build a
peak pool for human genome using the transcription factor binding site
clusters (V3) (see ?wgEncodeTfbsV3
) downloaded from ENCODE
with the HOT spots (?HOT.spots
) removed. HOT spots are the
genomic regions with high probability of being bound by many TFs in
ChIP-seq experiments7. We
suggest remove those HOT spots from the peak lists before performing
permutation test to avoid the overestimation of the association between
the two input peak lists. Users can also choose to remove ENCODE
blacklist for a given species. The blacklists were constructed by
identifying consistently problematic regions over independent cell lines
and types of experiments for each species in the ENCODE and modENCODE
datasets8. Please note that
some of the blacklists may need to be converted to the correct genome
assembly using liftover utility.
Following are the sample codes to do the permutation test using
permTest
:
data(HOT.spots)
data(wgEncodeTfbsV3)
hotGR <- reduce(unlist(HOT.spots))
removeOl <- function(.ele){
ol <- findOverlaps(.ele, hotGR)
if(length(ol)>0) .ele <- .ele[-unique(queryHits(ol))]
.ele
}
TAF <- removeOl(data[["TAF"]])
TEAD4 <- removeOl(data[["Tead4"]])
YY1 <- removeOl(data[["YY1"]])
# we subset the pool to save demo time
set.seed(1)
wgEncodeTfbsV3.subset <-
wgEncodeTfbsV3[sample.int(length(wgEncodeTfbsV3), 2000)]
pool <- new("permPool", grs=GRangesList(wgEncodeTfbsV3.subset), N=length(YY1))
pt1 <- peakPermTest(YY1, TEAD4, pool=pool, seed=1, force.parallel=FALSE)
plot(pt1)
permutation test for YY1 and TEAD4
pt2 <- peakPermTest(YY1, TAF, pool=pool, seed=1, force.parallel=FALSE)
plot(pt2)
permutation test for YY1 and TAF
The binding pattern around a genome feature could be visualized and compared using a side-by-side heatmap and density plot using the binding ranges of overlapping peaks.
features <- ol$peaklist[[length(ol$peaklist)]]
feature.recentered <- reCenterPeaks(features, width=4000)
## here we also suggest importData function in bioconductor trackViewer package
## to import the coverage.
## compare rtracklayer, it will save you time when handle huge dataset.
library(rtracklayer)
files <- dir(path, "bigWig")
if(.Platform$OS.type != "windows"){
cvglists <- sapply(file.path(path, files), import,
format="BigWig",
which=feature.recentered,
as="RleList")
}else{## rtracklayer can not import bigWig files on Windows
load(file.path(path, "cvglist.rds"))
}
names(cvglists) <- gsub(".bigWig", "", files)
feature.center <- reCenterPeaks(features, width=1)
sig <- featureAlignedSignal(cvglists, feature.center,
upstream=2000, downstream=2000)
##Because the bw file is only a subset of the original file,
##the signals are not exists for every peak.
keep <- rowSums(sig[[2]]) > 0
sig <- sapply(sig, function(.ele) .ele[keep, ], simplify = FALSE)
feature.center <- feature.center[keep]
heatmap <- featureAlignedHeatmap(sig, feature.center,
upstream=2000, downstream=2000,
upper.extreme=c(3,.5,4))
Heatmap of aligned features sorted by signal of TAF
sig.rowsums <- sapply(sig, rowSums, na.rm=TRUE)
d <- dist(sig.rowsums)
hc <- hclust(d)
feature.center$order <- hc$order
heatmap <- featureAlignedHeatmap(sig, feature.center,
upstream=2000, downstream=2000,
upper.extreme=c(3,.5,4),
sortBy="order")
Heatmap of aligned features sorted by hclut
featureAlignedDistribution(sig, feature.center,
upstream=2000, downstream=2000,
type="l")
Distribution of aligned features