Package: metagene
Modified: 18 september, 2015
Compiled: Wed Dec 20 19:39:27 2017
License: Artistic-2.0 | file LICENSE
This package produces metagene-like plots to compare the behavior of DNA-interacting proteins at selected groups of features. A typical analysis can be done in viscinity of transcription start sites (TSS) of genes or at any regions of interest (such as enhancers). Multiple combinations of group of features and/or group of bam files can be compared in a single analysis. Bootstraping analysis is used to compare the groups and locate regions with statistically different enrichment profiles. In order to increase the sensitivity of the analysis, alignment data is used instead of peaks produced with peak callers (i.e.: MACS2 or PICS). The metagene package uses bootstrap to obtain a better estimation of the mean enrichment and the confidence interval for every group of samples.
This vignette will introduce all the main features of the metagene package.
library(metagene)
There is no hard limit in the number of BAM files that can be included in an analysis (but with too many BAM files, memory may become an issue). BAM files must be indexed. For instance, if you use a file names file.bam
, a file named file.bam.bai
or file.bai
must be present in the same directory.
The path (relative or absolute) to the BAM files must be in a vector:
bam_files <- get_demo_bam_files()
bam_files
## [1] "/tmp/Rtmp5YXNbI/Rinst4a2e231599bd/metagene/extdata/align1_rep1.bam"
## [2] "/tmp/Rtmp5YXNbI/Rinst4a2e231599bd/metagene/extdata/align1_rep2.bam"
## [3] "/tmp/Rtmp5YXNbI/Rinst4a2e231599bd/metagene/extdata/align2_rep1.bam"
## [4] "/tmp/Rtmp5YXNbI/Rinst4a2e231599bd/metagene/extdata/align2_rep2.bam"
## [5] "/tmp/Rtmp5YXNbI/Rinst4a2e231599bd/metagene/extdata/ctrl.bam"
For this demo, we have 2 samples (each with 2 replicates). It is also possible to use a named vector to add your own names to each BAM files:
named_bam_files <- bam_files
names(named_bam_files) <- letters[seq_along(bam_files)]
named_bam_files
## a
## "/tmp/Rtmp5YXNbI/Rinst4a2e231599bd/metagene/extdata/align1_rep1.bam"
## b
## "/tmp/Rtmp5YXNbI/Rinst4a2e231599bd/metagene/extdata/align1_rep2.bam"
## c
## "/tmp/Rtmp5YXNbI/Rinst4a2e231599bd/metagene/extdata/align2_rep1.bam"
## d
## "/tmp/Rtmp5YXNbI/Rinst4a2e231599bd/metagene/extdata/align2_rep2.bam"
## e
## "/tmp/Rtmp5YXNbI/Rinst4a2e231599bd/metagene/extdata/ctrl.bam"
Using named BAM files can simplify the use of the metagene helper functions and the creation of the design.
To compare custom regions of interest, it is possible to use a list of one or more BED files.
regions <- get_demo_regions()
regions
## [1] "/tmp/Rtmp5YXNbI/Rinst4a2e231599bd/metagene/extdata/list1.bed"
## [2] "/tmp/Rtmp5YXNbI/Rinst4a2e231599bd/metagene/extdata/list2.bed"
The name of the files (without the extension) will be used to name each groups.
metagene
also support the narrowPeak and the broadPeak.
As an alternative to a list of BED files, GRanges
or GRangesList
objects can be used.
Some common datasets are already available with the metagene
package:
promoters_hg19
promoters_hg18
promoters_mm10
promoters_mm9
data(promoters_hg19)
promoters_hg19
## GRanges object with 23056 ranges and 1 metadata column:
## seqnames ranges strand | gene_id
## <Rle> <IRanges> <Rle> | <character>
## 1 chr19 [ 58873215, 58875214] - | 1
## 10 chr8 [ 18247755, 18249754] + | 10
## 100 chr20 [ 43279377, 43281376] - | 100
## 1000 chr18 [ 25756446, 25758445] - | 1000
## 10000 chr1 [244005887, 244007886] - | 10000
## ... ... ... ... . ...
## 9991 chr9 [115094945, 115096944] - | 9991
## 9992 chr21 [ 35735323, 35737322] + | 9992
## 9993 chr22 [ 19108968, 19110967] - | 9993
## 9994 chr6 [ 90538619, 90540618] + | 9994
## 9997 chr22 [ 50963906, 50965905] - | 9997
## -------
## seqinfo: 93 sequences (1 circular) from hg19 genome
For more details about each datasets, please refer to their documentation (i.e.:?promoters_hg19
).
A design group contains a set of BAM files that, when put together, represent a logical analysis. Furthermore, a design group contains the relationship between every BAM files present. Samples (with or without replicates) and controls can be assigned to a same design group. There can be as many groups as necessary. A BAM file can be assigned to more than one group.
To represent the relationship between every BAM files, design groups must have the following columns:
There is two possible way to create design groups, by reading a file or by directly creating a design object in R.
Design groups can be loaded into the metagene package by using a text file. As the relationship between BAM files as to be specified, the following columns are mandatory:
The file must also contain a header. It is recommanded to use Samples
for the name of the first column, but the value is not checked. The other columns in the design file will be used for naming design groups, and must be unique.
fileDesign <- system.file("extdata/design.txt", package="metagene")
design <- read.table(fileDesign, header=TRUE, stringsAsFactors=FALSE)
design$Samples <- paste(system.file("extdata", package="metagene"),
design$Samples, sep="/")
kable(design)
Samples | align1 | align2 |
---|---|---|
/tmp/Rtmp5YXNbI/Rinst4a2e231599bd/metagene/extdata/align1_rep1.bam | 1 | 0 |
/tmp/Rtmp5YXNbI/Rinst4a2e231599bd/metagene/extdata/align1_rep2.bam | 1 | 0 |
/tmp/Rtmp5YXNbI/Rinst4a2e231599bd/metagene/extdata/align2_rep1.bam | 0 | 1 |
/tmp/Rtmp5YXNbI/Rinst4a2e231599bd/metagene/extdata/align2_rep2.bam | 0 | 1 |
/tmp/Rtmp5YXNbI/Rinst4a2e231599bd/metagene/extdata/ctrl.bam | 2 | 2 |
It is not obligatory to use a design file, you can create the design data.frame
using your prefered method (as long as the restrictions on the values mentioned previously are respected).
For instance, the previous design data.frame could have been create directly in R:
design <- data.frame(Samples = c("align1_rep1.bam", "align1_rep2.bam",
"align2_rep1.bam", "align2_rep2.bam", "ctrl.bam"),
align1 = c(1,1,0,0,2), align2 = c(0,0,1,1,2))
design$Samples <- paste0(system.file("extdata", package="metagene"), "/",
design$Samples)
kable(design)
Samples | align1 | align2 |
---|---|---|
/tmp/Rtmp5YXNbI/Rinst4a2e231599bd/metagene/extdata/align1_rep1.bam | 1 | 0 |
/tmp/Rtmp5YXNbI/Rinst4a2e231599bd/metagene/extdata/align1_rep2.bam | 1 | 0 |
/tmp/Rtmp5YXNbI/Rinst4a2e231599bd/metagene/extdata/align2_rep1.bam | 0 | 1 |
/tmp/Rtmp5YXNbI/Rinst4a2e231599bd/metagene/extdata/align2_rep2.bam | 0 | 1 |
/tmp/Rtmp5YXNbI/Rinst4a2e231599bd/metagene/extdata/ctrl.bam | 2 | 2 |
A typical metagene analysis will consist steps:
A minimal metagene analysis can be performed in 2 steps:
new
function).plot
regions <- get_demo_regions()
bam_files <- get_demo_bam_files()
# Initialization
mg <- metagene$new(regions = regions, bam_files = bam_files)
# Plotting
mg$plot(title = "Demo metagene plot")
## produce data table : ChIP-Seq
## produce data frame : ChIP-Seq
## Plot : ChIP-Seq