Charles Joly Beauparlant, Fabien Claude Lamaze, Rawane Samb, Astrid Louise Deschenes and Arnaud Droit.
This package and the underlying metagene code are distributed under the Artistic license 2.0. You are free to use and redistribute this software.
This package produces Metagene-like plots to compare the behavior of DNA-interacting proteins at selected groups of features. A typical analysis can be done in viscinity of transcription start sites (TSS) of genes or at any regions of interest (such as enhancers). Multiple combinations of group of features and/or group of bam files can be compared in a single analysis. Bootstraping analysis is used to compare the groups and locate regions with statistically different enrichment profiles. In order to increase the sensitivity of the analysis, alignment data is used instead of peaks produced with peak callers (i.e.: MACS2 or PICS). The metagene package uses bootstrap to obtain a better estimation of the mean enrichment and the confidence interval for every group of samples.
This vignette will introduce all the main features of the metagene package.
suppressMessages(library(metagene))
There is no hard limit in the number of BAM files that can be included in an
analysis (but with too many BAM files, memory may become an issue). BAM files
must be indexed. For instance, if you use a file names file.bam
, a file
named file.bam.bai
must be present in the same directory.
The path (relative or absolute) to the BAM files must be in a vector:
bam_files <- get_demo_bam_files()
bam_files
## [1] "/tmp/RtmprjWsYE/Rinst144f5d6d69fc/metagene/extdata/align1_rep1.bam"
## [2] "/tmp/RtmprjWsYE/Rinst144f5d6d69fc/metagene/extdata/align1_rep2.bam"
## [3] "/tmp/RtmprjWsYE/Rinst144f5d6d69fc/metagene/extdata/align2_rep1.bam"
## [4] "/tmp/RtmprjWsYE/Rinst144f5d6d69fc/metagene/extdata/align2_rep2.bam"
For this demo, we have 2 samples (each with 2 replicates).
To compare custom regions of interest, it is possible to use a list of one or more BED files.
regions <- get_demo_regions()
regions
## [1] "/tmp/RtmprjWsYE/Rinst144f5d6d69fc/metagene/extdata/list1.bed"
## [2] "/tmp/RtmprjWsYE/Rinst144f5d6d69fc/metagene/extdata/list2.bed"
The name of the files (without the extension) will be used to name each groups.
As an alternative to a list of bed files, GRanges or GRangesList objects can be used.
A design group contains a set of BAM files that, when pull together, represent a logical analysis. Furthermore, a design group contains the relationship between every BAM files present. Samples (with or without replicates) and controls can be assigned to a same design group. There can be as many groups as necessary. A BAM file can be assigned to more than one group.
To represent the relationship between every BAM files, design groups must have the following columns:
There is two possible way to create design groups, by reading a file or by directly creating a design object in R.
Design groups can be loaded into the metagene package by using a text file. As the relationship between BAM files as to be specified, the following columns are mandatory:
The file must also contain a header. It is recommanded to use Samples for the name of the first column, but the value is not checked. The other columns in the design file will be used for naming design groups, and must be unique.
fileDesign <- system.file("extdata/design.txt", package="metagene")
design <- read.table(fileDesign, header=TRUE, stringsAsFactors=FALSE)
design$Samples <- paste(system.file("extdata", package="metagene"),
design$Samples, sep="/")
design
## Samples
## 1 /tmp/RtmprjWsYE/Rinst144f5d6d69fc/metagene/extdata/align1_rep1.bam
## 2 /tmp/RtmprjWsYE/Rinst144f5d6d69fc/metagene/extdata/align1_rep2.bam
## 3 /tmp/RtmprjWsYE/Rinst144f5d6d69fc/metagene/extdata/align2_rep1.bam
## 4 /tmp/RtmprjWsYE/Rinst144f5d6d69fc/metagene/extdata/align2_rep2.bam
## 5 /tmp/RtmprjWsYE/Rinst144f5d6d69fc/metagene/extdata/ctrl.bam
## align1 align2
## 1 1 0
## 2 1 0
## 3 0 1
## 4 0 1
## 5 2 2
It is not obligatory to use a design file, you can create the design
data.frame
using your prefered method (as long as the restrictions on the
values mentioned previously are respected).
For instance, the previous design data.frame could have been create directly in R:
design <- data.frame(Samples = c("align1_rep1.bam", "align1_rep2.bam",
"align2_rep1.bam", "align2_rep2.bam", "ctrl.bam"),
align1 = c(1,1,0,0,2), align2 = c(0,0,1,1,2))
design$Samples <- paste0(system.file("extdata", package="metagene"), "/",
design$Samples)
design
## Samples
## 1 /tmp/RtmprjWsYE/Rinst144f5d6d69fc/metagene/extdata/align1_rep1.bam
## 2 /tmp/RtmprjWsYE/Rinst144f5d6d69fc/metagene/extdata/align1_rep2.bam
## 3 /tmp/RtmprjWsYE/Rinst144f5d6d69fc/metagene/extdata/align2_rep1.bam
## 4 /tmp/RtmprjWsYE/Rinst144f5d6d69fc/metagene/extdata/align2_rep2.bam
## 5 /tmp/RtmprjWsYE/Rinst144f5d6d69fc/metagene/extdata/ctrl.bam
## align1 align2
## 1 1 0
## 2 1 0
## 3 0 1
## 4 0 1
## 5 2 2
A typical metagene analysis will consist steps:
To facilitate the analysis, all those steps managed by a metagene object. A typical analysis will require 2 steps:
To extract coverages, the user must initialize the metagene object. The minimal requirements for this step is a list of bam files and a list of regions:
mg <- metagene$new(regions, bam_files)
The statistical analysis and the production of the metagene plot are done
in a single step with the function plot
of the metagene object. In this
specific case, we know that all the regions have the same size and are all
centered on the same element. We can specify this information with the range
parameter.
results <- mg$plot(range = c(-1000,1000))
## [1] "align1_rep1_list1"
## [1] "align1_rep2_list1"
## [1] "align2_rep1_list1"
## [1] "align2_rep2_list1"
## [1] "align1_rep1_list2"
## [1] "align1_rep2_list2"
## [1] "align2_rep1_list2"
## [1] "align2_rep2_list2"
The plot function returns a list containing the data.frame
used by
ggplot2 to produce the results as well as the Friedman test results
(or NULL
if no test run).
class(results)
## [1] "list"
names(results)
## [1] "DF" "friedman_test" "graph"
head(results$DF)
## group position value qinf qsup
## 1 align1_rep1_list1 -1000.0000 119.6764 84.09493 164.6710
## 2 align1_rep1_list1 -894.7368 127.4002 83.48436 189.4498
## 3 align1_rep1_list1 -789.4737 138.6408 83.18878 229.0831
## 4 align1_rep1_list1 -684.2105 159.0291 121.46278 204.8910
## 5 align1_rep1_list1 -578.9474 181.8123 140.06257 234.4844
## 6 align1_rep1_list1 -473.6842 197.9288 132.28587 308.8047