Package version: derfinderPlot 1.4.1

Contents

1 Introduction to derfinderPlot

2 Overview

derfinderPlot (Collado-Torres, Jaffe, and Leek, 2015) is an addon package for derfinder (Collado-Torres, Frazee, Love, Irizarry, et al., 2015) with functions that allow you to visualize the results.

While the functions in derfinderPlot assume you generated the data with derfinder, they can be used with other GRanges objects properly formatted.

The functions in derfinderPlot are:

3 Example

As an example, we will analyze a small subset of the samples from the BrainSpan Atlas of the Human Brain (BrainSpan, 2011) publicly available data.

We first load the required packages.

## Load libraries
suppressPackageStartupMessages(library('derfinder'))
## Note: the specification for S3 class "AsIs" in package 'BiocGenerics' seems equivalent to one from package 'RJSONIO': not turning on duplicate class definitions for this class.
## Note: the specification for S3 class "connection" in package 'BiocGenerics' seems equivalent to one from package 'RJSONIO': not turning on duplicate class definitions for this class.
## Note: the specification for S3 class "file" in package 'BiocGenerics' seems equivalent to one from package 'RJSONIO': not turning on duplicate class definitions for this class.
## Note: the specification for S3 class "pipe" in package 'BiocGenerics' seems equivalent to one from package 'RJSONIO': not turning on duplicate class definitions for this class.
## Note: the specification for S3 class "textConnection" in package 'BiocGenerics' seems equivalent to one from package 'RJSONIO': not turning on duplicate class definitions for this class.
library('derfinderData')
library('derfinderPlot')

## For tMatrix() colors
library('grDevices')
library('RColorBrewer')

3.1 Analyze data

For this example, we created a small table with the relevant phenotype data for 12 samples: 6 from fetal samples and 6 from adult samples. We chose at random a brain region, in this case the primary auditory cortex (core) and for the example we will only look at data from chromosome 21. Other variables include the age in years and the gender. The data is shown below.

library('knitr')
## Get pheno table
pheno <- subset(brainspanPheno, structure_acronym == 'A1C')

## Display the main information
p <- pheno[, -which(colnames(pheno) %in% c('structure_acronym', 'structure_name', 'file'))]
rownames(p) <- NULL
kable(p, format = 'html', row.names = TRUE)
gender lab Age group
1 M HSB114.A1C -0.5192308 fetal
2 M HSB103.A1C -0.5192308 fetal
3 M HSB178.A1C -0.4615385 fetal
4 M HSB154.A1C -0.4615385 fetal
5 F HSB150.A1C -0.5384615 fetal
6 F HSB149.A1C -0.5192308 fetal
7 F HSB130.A1C 21.0000000 adult
8 M HSB136.A1C 23.0000000 adult
9 F HSB126.A1C 30.0000000 adult
10 M HSB145.A1C 36.0000000 adult
11 M HSB123.A1C 37.0000000 adult
12 F HSB135.A1C 40.0000000 adult

We can load the data from derfinderData (Collado-Torres, Jaffe, and Leek, 2015) by first identifying the paths to the BigWig files with derfinder::rawFiles() and then loading the data with derfinder::fullCoverage().

## Determine the files to use and fix the names
files <- rawFiles(system.file('extdata', 'A1C', package = 'derfinderData'), samplepatt = 'bw', fileterm = NULL)
names(files) <- gsub('.bw', '', names(files))

## Load the data from disk
system.time(fullCov <- fullCoverage(files = files, chrs = 'chr21'))
## 2015-11-02 20:40:21 fullCoverage: processing chromosome chr21
## 2015-11-02 20:40:21 loadCoverage: finding chromosome lengths
## 2015-11-02 20:40:21 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.2-bioc/R/library/derfinderData/extdata/A1C/HSB103.bw
## 2015-11-02 20:40:21 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.2-bioc/R/library/derfinderData/extdata/A1C/HSB114.bw
## 2015-11-02 20:40:21 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.2-bioc/R/library/derfinderData/extdata/A1C/HSB123.bw
## 2015-11-02 20:40:21 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.2-bioc/R/library/derfinderData/extdata/A1C/HSB126.bw
## 2015-11-02 20:40:22 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.2-bioc/R/library/derfinderData/extdata/A1C/HSB130.bw
## 2015-11-02 20:40:22 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.2-bioc/R/library/derfinderData/extdata/A1C/HSB135.bw
## 2015-11-02 20:40:22 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.2-bioc/R/library/derfinderData/extdata/A1C/HSB136.bw
## 2015-11-02 20:40:22 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.2-bioc/R/library/derfinderData/extdata/A1C/HSB145.bw
## 2015-11-02 20:40:22 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.2-bioc/R/library/derfinderData/extdata/A1C/HSB149.bw
## 2015-11-02 20:40:22 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.2-bioc/R/library/derfinderData/extdata/A1C/HSB150.bw
## 2015-11-02 20:40:23 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.2-bioc/R/library/derfinderData/extdata/A1C/HSB154.bw
## 2015-11-02 20:40:23 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.2-bioc/R/library/derfinderData/extdata/A1C/HSB178.bw
## 2015-11-02 20:40:23 loadCoverage: applying the cutoff to the merged data
## 2015-11-02 20:40:23 filterData: originally there were 48129895 rows, now there are 48129895 rows. Meaning that 0 percent was filtered.
##    user  system elapsed 
##   2.117   0.020   2.219

Alternatively, since the BigWig files are publicly available from BrainSpan (see here), we can extract the relevant coverage data using derfinder::fullCoverage(). Note that as of rtracklayer 1.25.16 BigWig files are not supported on Windows: you can find the fullCov object inside derfinderData to follow the examples.

## Determine the files to use and fix the names
files <- pheno$file
names(files) <- gsub('.A1C', '', pheno$lab)

## Load the data from the web
system.time(fullCov <- fullCoverage(files = files, chrs = 'chr21'))

Once we have the base-level coverage data for all 12 samples, we can construct the models. In this case, we want to find differences between fetal and adult samples while adjusting for gender and a proxy of the library size.

## Get some idea of the library sizes
sampleDepths <- sampleDepth(collapseFullCoverage(fullCov), 1)
## 2015-11-02 20:40:23 sampleDepth: Calculating sample quantiles
## 2015-11-02 20:40:27 sampleDepth: Calculating sample adjustments
## Define models
models <- makeModels(sampleDepths, testvars = pheno$group, adjustvars = pheno[, c('gender')])

Next, we can find candidate differentially expressed regions (DERs) using as input the segments of the genome where at least one sample has coverage greater than 3. In this particular example, we chose a low theoretical F-statistic cutoff and used 20 permutations.

## Filter coverage
filteredCov <- lapply(fullCov, filterData, cutoff = 3)
## 2015-11-02 20:40:27 filterData: originally there were 48129895 rows, now there are 90023 rows. Meaning that 99.81 percent was filtered.
## Perform differential expression analysis
suppressPackageStartupMessages(library('bumphunter'))
system.time(results <- analyzeChr(chr = 'chr21', filteredCov$chr21, models, groupInfo = pheno$group, writeOutput = FALSE, cutoffFstat = 5e-02, nPermute = 20, seeds = 20140923 + seq_len(20)))
## 2015-11-02 20:40:28 analyzeChr: Pre-processing the coverage data
## 2015-11-02 20:40:31 analyzeChr: Calculating statistics
## 2015-11-02 20:40:31 calculateStats: calculating the F-statistics
## 2015-11-02 20:40:31 analyzeChr: Calculating pvalues
## 2015-11-02 20:40:31 analyzeChr: Using the following theoretical cutoff for the F-statistics 5.31765507157871
## 2015-11-02 20:40:31 calculatePvalues: identifying data segments
## 2015-11-02 20:40:31 findRegions: segmenting F-stats information
## 2015-11-02 20:40:31 findRegions: identifying candidate regions
## 2015-11-02 20:40:31 findRegions: identifying region clusters
## 2015-11-02 20:40:32 calculatePvalues: calculating F-statistics for permutation 1 and seed 20140924
## 2015-11-02 20:40:32 findRegions: segmenting F-stats information
## 2015-11-02 20:40:32 findRegions: identifying candidate regions
## 2015-11-02 20:40:32 calculatePvalues: calculating F-statistics for permutation 2 and seed 20140925
## 2015-11-02 20:40:32 findRegions: segmenting F-stats information
## 2015-11-02 20:40:32 findRegions: identifying candidate regions
## 2015-11-02 20:40:32 calculatePvalues: calculating F-statistics for permutation 3 and seed 20140926
## 2015-11-02 20:40:32 findRegions: segmenting F-stats information
## 2015-11-02 20:40:32 findRegions: identifying candidate regions
## 2015-11-02 20:40:32 calculatePvalues: calculating F-statistics for permutation 4 and seed 20140927
## 2015-11-02 20:40:33 findRegions: segmenting F-stats information
## 2015-11-02 20:40:33 findRegions: identifying candidate regions
## 2015-11-02 20:40:33 calculatePvalues: calculating F-statistics for permutation 5 and seed 20140928
## 2015-11-02 20:40:33 findRegions: segmenting F-stats information
## 2015-11-02 20:40:33 findRegions: identifying candidate regions
## 2015-11-02 20:40:33 calculatePvalues: calculating F-statistics for permutation 6 and seed 20140929
## 2015-11-02 20:40:33 findRegions: segmenting F-stats information
## 2015-11-02 20:40:33 findRegions: identifying candidate regions
## 2015-11-02 20:40:33 calculatePvalues: calculating F-statistics for permutation 7 and seed 20140930
## 2015-11-02 20:40:34 findRegions: segmenting F-stats information
## 2015-11-02 20:40:34 findRegions: identifying candidate regions
## 2015-11-02 20:40:34 calculatePvalues: calculating F-statistics for permutation 8 and seed 20140931
## 2015-11-02 20:40:34 findRegions: segmenting F-stats information
## 2015-11-02 20:40:34 findRegions: identifying candidate regions
## 2015-11-02 20:40:34 calculatePvalues: calculating F-statistics for permutation 9 and seed 20140932
## 2015-11-02 20:40:34 findRegions: segmenting F-stats information
## 2015-11-02 20:40:34 findRegions: identifying candidate regions
## 2015-11-02 20:40:34 calculatePvalues: calculating F-statistics for permutation 10 and seed 20140933
## 2015-11-02 20:40:34 findRegions: segmenting F-stats information
## 2015-11-02 20:40:34 findRegions: identifying candidate regions
## 2015-11-02 20:40:34 calculatePvalues: calculating F-statistics for permutation 11 and seed 20140934
## 2015-11-02 20:40:35 findRegions: segmenting F-stats information
## 2015-11-02 20:40:35 findRegions: identifying candidate regions
## 2015-11-02 20:40:35 calculatePvalues: calculating F-statistics for permutation 12 and seed 20140935
## 2015-11-02 20:40:35 findRegions: segmenting F-stats information
## 2015-11-02 20:40:35 findRegions: identifying candidate regions
## 2015-11-02 20:40:35 calculatePvalues: calculating F-statistics for permutation 13 and seed 20140936
## 2015-11-02 20:40:35 findRegions: segmenting F-stats information
## 2015-11-02 20:40:35 findRegions: identifying candidate regions
## 2015-11-02 20:40:35 calculatePvalues: calculating F-statistics for permutation 14 and seed 20140937
## 2015-11-02 20:40:35 findRegions: segmenting F-stats information
## 2015-11-02 20:40:35 findRegions: identifying candidate regions
## 2015-11-02 20:40:35 calculatePvalues: calculating F-statistics for permutation 15 and seed 20140938
## 2015-11-02 20:40:35 findRegions: segmenting F-stats information
## 2015-11-02 20:40:35 findRegions: identifying candidate regions
## 2015-11-02 20:40:35 calculatePvalues: calculating F-statistics for permutation 16 and seed 20140939
## 2015-11-02 20:40:36 findRegions: segmenting F-stats information
## 2015-11-02 20:40:36 findRegions: identifying candidate regions
## 2015-11-02 20:40:36 calculatePvalues: calculating F-statistics for permutation 17 and seed 20140940
## 2015-11-02 20:40:36 findRegions: segmenting F-stats information
## 2015-11-02 20:40:36 findRegions: identifying candidate regions
## 2015-11-02 20:40:36 calculatePvalues: calculating F-statistics for permutation 18 and seed 20140941
## 2015-11-02 20:40:36 findRegions: segmenting F-stats information
## 2015-11-02 20:40:36 findRegions: identifying candidate regions
## 2015-11-02 20:40:36 calculatePvalues: calculating F-statistics for permutation 19 and seed 20140942
## 2015-11-02 20:40:36 findRegions: segmenting F-stats information
## 2015-11-02 20:40:36 findRegions: identifying candidate regions
## 2015-11-02 20:40:36 calculatePvalues: calculating F-statistics for permutation 20 and seed 20140943
## 2015-11-02 20:40:37 findRegions: segmenting F-stats information
## 2015-11-02 20:40:37 findRegions: identifying candidate regions
## 2015-11-02 20:40:37 calculatePvalues: calculating the p-values
## 2015-11-02 20:40:37 analyzeChr: Annotating regions
## Loading required package: GenomicFeatures
## Loading required package: AnnotationDbi
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## Warning:   Calling species() on a TxDb object is *deprecated*.
##   Please use organism() instead.
## No annotationPackage supplied. Trying org.Hs.eg.db.
## Loading required package: org.Hs.eg.db
## Loading required package: DBI
## 
## Getting TSS and TSE.
## Getting CSS and CSE.
## Getting exons.
## Annotating genes.
## ...
##    user  system elapsed 
##  67.225   0.488  67.752
## Quick access to the results
regions <- results$regions$regions

## Annotation database to use
suppressPackageStartupMessages(library('TxDb.Hsapiens.UCSC.hg19.knownGene'))
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

3.2 plotOverview()

Now that we have obtained the main results using derfinder, we can proceed to visualizing the results using derfinderPlot. The easiest to use of all the functions is plotOverview() which takes a set of regions and annotation information produced by bumphunter::annotateNearest().

The plot below shows the candidate DERs colored by whether their q-value was less than 0.10 or not.

## Q-values overview
plotOverview(regions = regions, annotation = results$annotation, type = 'qval')
## 2015-11-02 20:41:36 plotOverview: assigning chromosome lengths from hg19!!!
## Scale for 'x' is already present. Adding another scale for 'x', which will replace the existing scale.
## Scale for 'x' is already present. Adding another scale for 'x', which will replace the existing scale.

The next plot shows the candidate DERs colored by the type of gene feature they are nearest too.

## Annotation overview
plotOverview(regions = regions, annotation = results$annotation, type = 'annotation')
## 2015-11-02 20:41:37 plotOverview: assigning chromosome lengths from hg19!!!
## Scale for 'x' is already present. Adding another scale for 'x', which will replace the existing scale.

In this particular example, because we are only using data from one chromosome the above plot is not as informative as in a real case scenario. However, with this plot we can quickly observe that nearly all of the candidate DERs are inside an exon.

3.3 plotRegionCoverage()

The complete opposite of visualizing the candidate DERs at the genome level is to visualize them one region at a time. plotRegionCoverage() allows us to do this quickly for a large number of regions.

Before using this function, we need to process more detailed information using two derfinder functions: annotateRegions() and getRegionCoverage() as shown below.

## Get required information for the plots
annoRegs <- annotateRegions(regions, genomicState$fullGenome)
## 2015-11-02 20:41:37 annotateRegions: counting
## 2015-11-02 20:41:38 annotateRegions: annotating
regionCov <- getRegionCoverage(fullCov, regions)
## 2015-11-02 20:41:38 getRegionCoverage: processing chr21
## 2015-11-02 20:41:38 getRegionCoverage: done processing chr21

Once we have the relevant information we can proceed to plotting the first 10 regions. In this case, we will supply plotRegionCoverage() with the information it needs to plot transcripts overlapping these 10 regions.

## Plot top 10 regions
plotRegionCoverage(regions = regions, regionCoverage = regionCov, 
    groupInfo = pheno$group, nearestAnnotation = results$annotation, 
    annotatedRegions = annoRegs, whichRegions=1:10, txdb = txdb, scalefac = 1, ask = FALSE)
## 2015-11-02 20:41:38 plotRegionCoverage: extracting Tx info
## 2015-11-02 20:41:41 plotRegionCoverage: getting Tx plot info

The base level coverage is shown in a log2 scale with any overlapping exons shown in dark blue and known introns in light blue.

3.4 plotCluster()

In this example, we noticed with the plotRegionCoverage() plots that most of the candidate DERs are contained in known exons. Sometimes, the signal might be low or we might have used very stringent cutoffs in the derfinder analysis. One way we can observe this is by plotting clusters of regions where a cluster is defined as regions within 300 bp (default option) of each other.

To visualize the clusters, we can use plotCluster() which takes similar input to plotOverview() with the notable addition of the coverage information as well as the idx argument. This argument specifies which region to focus on: it will be plotted with a red bar and will determine the cluster to display.

In the plot below we observe one large candidate DER with other nearby ones that do not have a q-value less than 0.10. In a real analysis, we would probably discard this region as the coverage is fairly low.

## First cluster
plotCluster(idx = 1, regions = regions, annotation = results$annotation, coverageInfo = fullCov$chr21, txdb = txdb, groupInfo = pheno$group, titleUse = 'pval')
## Parsing transcripts...
## Parsing exons...
## Parsing cds...
## Parsing utrs...
## ------exons...
## ------cdss...
## ------introns...
## ------utr...
## aggregating...
## Done
## Constructing graphics...

The second cluster shows a larger number of potential DERs (again without q-values less than 0.10) in a segment of the genome where the coverage data is highly variable. This is a common occurrence with RNA-seq data.

## Second cluster
plotCluster(idx = 2, regions = regions, annotation = results$annotation, coverageInfo = fullCov$chr21, txdb = txdb, groupInfo = pheno$group, titleUse = 'pval')
## Parsing transcripts...
## Parsing exons...
## Parsing cds...
## Parsing utrs...
## ------exons...
## ------cdss...
## ------introns...
## ------utr...
## aggregating...
## Done
## Constructing graphics...