Package: methylInheritance
Authors: Astrid Deschênes, Pascal Belleau and Arnaud Droit
Version: 1.0.0
Compiled date: 2017-04-24
License: Artistic-2.0
The methylInheritance package and the underlying methylInheritance code are distributed under the Artistic license 2.0. You are free to use and redistribute this software.
DNA methylation plays an important role in the biology of tissue development and diseases. High-throughput sequencing techniques enable genome-wide detection of differentially methylated elements (DME), commonly sites (DMS) or regions (DMR). The analysis of treatment effects on DNA methylation, from one generation to the next (inter-generational) and across generations that were not exposed to the initial environment (trans-generational) represent complex designs. Due to software design, the detection of DME is usually made on each generation separately. However, the common DME between generations due to randomness is not negligible when the number of DME detected in each generation is high. To judge the effect on DME that is inherited from a treatment in previous generation, the observed number of conserved DME must be compared to the randomly expected number.
We present a permutation analysis, based on Monte Carlo sampling, aim to infer a relation between the number of conserved DME from one generation to the next to the inheritance effect of treatment and to dismiss stochastic effect. It is used as a robust alternative to inference based on parametric assumptions.
The methylInheritance package can perform a permutation analysis on both differentially methylated sites (DMS) and differentially methylated tiles (DMT) using the methylKit package.
As with any R package, the methylInheritance package should first be loaded with the following command:
library(methylInheritance)
The permutation analysis is a statistical significance test in which the distribution of the test statistic under the null hypothesis is obtained by calculating the values of the test statistic under rearrangement of the labels on the observed data points. The rearrangement of the labels is done through repeated random sampling (P. Legendre and Legendre 1998, 142–57).
Null Hypothesis: The number of conserved DME correspond to a number that can be obtained through a randomness analysis.
Alternative Hypothesis: The number of conserved DME do not correspond to a number that can be obtained through a randomness analysis.
A typical methylInheritance analysis consists of the following steps:
All those steps have been encoded in the methylInheritance package.
A dataset containing methylation data (6 cases and 6 controls) over three generations has been generated using the methylInheritanceSim package.
## Load dataset containing information over three generations
data(demoForTransgenerationalAnalysis)
## The length of the dataset corresponds to the number of generation
## The generations are stored in order (first entry = first generation,
## second entry = second generation, etc..)
length(demoForTransgenerationalAnalysis)
## [1] 3
## All samples related to one generation are contained in a methylRawList
## object.
## The methylRawList object contains two Slots:
## 1- treatment: a numeric vector denoting controls and cases.
## 2- .Data: a list of methylRaw objects. Each object stores the raw
## mehylation data of one sample.
## A section of the methylRaw object containing the information of the
## first sample from the second generation
head(demoForTransgenerationalAnalysis[[2]][[1]])
## chr start end strand coverage numCs numTs
## 1 S 1000 1000 + 88 87 1
## 2 S 1038 1038 + 77 76 1
## 3 S 1061 1061 + 84 84 0
## 4 S 1098 1098 + 100 46 54
## 5 S 1758 1758 + 93 93 0
## 6 S 1801 1801 + 77 76 1
## The treatment vector for each generation
## The number of treatments and controls is the same in each generation
## However, it could also be different.
## Beware that getTreatment() is a function from the methylKit package.
getTreatment(demoForTransgenerationalAnalysis[[1]])
## [1] 0 0 0 0 0 0 1 1 1 1 1 1
getTreatment(demoForTransgenerationalAnalysis[[2]])
## [1] 0 0 0 0 0 0 1 1 1 1 1 1
getTreatment(demoForTransgenerationalAnalysis[[3]])
## [1] 0 0 0 0 0 0 1 1 1 1 1 1
The observation analysis can be run on all generations using the runObservation() function.
The observation results can optionally be stored in a RDS file. To store the results into a RDS file, the outputDir parameter must be given a directory path.
## The observation analysis is only done on differentially methylated sites
observationResult <- runObservation(methylKitData =
demoForTransgenerationalAnalysis,
type = "sites", # Only sites
outputDir = NULL, # RDS files only created when a
# path is given
nbrCores = 1, # Number of cores used
minReads = 10, # Minimum read coverage
minMethDiff = 10, # Minimum difference in methylation
# to be considered DMS
qvalue = 0.01,
vSeed = 2101) # Ensure reproducible results
## Only one entry, containing all results, is created for the sites
observationResult$OBSERVATION$SITES
## $i2
## $i2$HYPER
## $i2$HYPER[[1]]
## [1] 16
##
## $i2$HYPER[[2]]
## [1] 16
##
##
## $i2$HYPO
## $i2$HYPO[[1]]
## [1] 2
##
## $i2$HYPO[[2]]
## [1] 4
##
##
##
## $iAll
## $iAll$HYPER
## $iAll$HYPER[[1]]
## [1] 14
##
##
## $iAll$HYPO
## $iAll$HYPO[[1]]
## [1] 2
The permutation analysis can be run on all generations using the runPermutation() function.
The observation and the permutation analysis can be run together by setting the runObservationAnalysis = TRUE in the runPermutation() function.
All permutations can also be, optionally, saved in RDS files. To store the results into RDS files, the outputDir parameter must be given a directory path.
At last, the name of the RDS file that contains the methylKit object can also be used as an argument to the runPermutation() function.
## The permutation analysis is only done on differentially methylated sites
permutationResult <- runPermutation(
methylKitData = demoForTransgenerationalAnalysis, # multi-generational dataset
type = "sites", # Only sites
outputDir = NULL, # RDS files only created when a
# path is given
runObservationAnalysis = FALSE,
nbrCores = 1, # Number of cores used
nbrPermutations = 2, # Should be much higher for a
# real analysis
minReads = 10, # Minimum read coverage
minMethDiff = 10, # Minimum difference in methylation
# to be considered DMS
qvalue = 0.01,
vSeed = 2101) # Ensure reproducible results
## The length of the PERMUTATION entry should correspond to the number
## of permutations
length(permutationResult$PERMUTATION)
## [1] 2
## Each entry in PERMUTATION correspond to the result of one permutation
## THe results of the second permutation
permutationResult$PERMUTATION[[2]]$SITES
## $i2
## $i2$HYPER
## $i2$HYPER[[1]]
## [1] 34
##
## $i2$HYPER[[2]]
## [1] 2
##
##
## $i2$HYPO
## $i2$HYPO[[1]]
## [1] 13
##
## $i2$HYPO[[2]]
## [1] 0
##
##
##
## $iAll
## $iAll$HYPER
## $iAll$HYPER[[1]]
## [1] 0
##
##
## $iAll$HYPO
## $iAll$HYPO[[1]]
## [1] 0
The observation and permutation results can be merged using the mergePermutationAndObservation() function.
## Merge observation and permutation results
allResults <- mergePermutationAndObservation(permutationResults =
permutationResult,
observationResults = observationResult)
allResults
## Permutation Analysis
##
## Number of Generations: 3
## Number of Permutations: 2
##
## Observation Results:
## SOURCE ELEMENT ANALYSIS TYPE RESULT
## 1 OBSERVATION SITES i2 HYPER1 16
## 2 OBSERVATION SITES i2 HYPER2 16
## 3 OBSERVATION SITES i2 HYPO1 2
## 4 OBSERVATION SITES i2 HYPO2 4
## 5 OBSERVATION SITES iAll HYPER 14
## 6 OBSERVATION SITES iAll HYPO 2
When observation and permutation analysis have been run together using the runPermutation() function, this step can be skipped.
The runPermutation() and runObservation() functions calculate the number of conserved differentially methylated elements between two consecutive generations (F1 and F2, F2 and F3, etc..). The number of conserved differentially methylated elements is also calculated for three or more consecutive generations, always starting with the first generation (F1 and F2 and F3, F1 and F2 and F3 and F4, etc..).
A specific analysis can be extracted from the results using extractInfo() function.
The type parameter can be set to extract one of those elements:
The inter parameter can be set to extract one of those analysis type:
## Conserved differentially methylated sites between F1 and F2 generations
F1_and_F2_results <- extractInfo(allResults = allResults, type = "sites",
inter = "i2", position = 1)
head(F1_and_F2_results)
## TYPE RESULT SOURCE
## 1 HYPO 2 OBSERVATION
## 2 HYPER 16 OBSERVATION
## 3 HYPO 3 PERMUTATION
## 4 HYPER 2 PERMUTATION
## 5 HYPO 13 PERMUTATION
## 6 HYPER 34 PERMUTATION
The permutation analysis has been run on the demoForTransgenerationalAnalysis dataset with 1000 permutations (nbrPermutation = 1000). The results of those permutations will be used to generate the significant levels and the visual representations.
## Differentially conserved sites between F1 and F2 generations
F1_and_F2 <- extractInfo(allResults = demoResults, type = "sites",
inter = "i2", position = 1)
## Differentially conserved sites between F2 and F3 generations
F2_and_F3 <- extractInfo(allResults = demoResults, type = "sites",
inter = "i2", position = 2)
## Differentially conserved sites between F1 and F2 and F3 generations
F2_and_F3 <- extractInfo(allResults = demoResults, type = "sites",
inter = "iAll", position = 1)
## Show graph and significant level for differentially conserved sites
## between F1 and F2
output <- plotGraph(F1_and_F2)
When a large number of permutations is processed, the time needed to process them all may be long (especially when the number of available CPU is limited). Furthermore, some permutations can fail due to parallelization problems.
The methylInheritance package offers the possibility to restart an analysis and run only missing permutation results. To take advantage of this option, the outputDir parameter must not be NULL so that permutation results are saved in RDS files. When the restartCalculation is set to TRUE, the method will load the permutation results present in RDS files (when available) and only rerun permutations that don’t have an associated RDS file.
## The permutation analysis is only done on differentially methylated tiles
## The "output" directory must be specified
## The "vSeed" must be specified to ensure reproducible results
## The "restartCalculation" is not important the first time the analysis is run
permutationResult <- runPermutation(
methylKitData = demoForTransgenerationalAnalysis, # multi-generational dataset
type = "tiles", # Only tiles
outputDir = "test_restart", # RDS files are created
runObservationAnalysis = TRUE,
nbrCores = 1, # Number of cores used
nbrPermutations = 2, # Should be much higher for a
# real analysis
vSeed = 212201, # Ensure reproducible results
restartCalculation = FALSE)
## Assume that the process was stopped before it has done all the permutations
## The process can be restarted
## All parameters must be identical to the first analysis except "restartCalculation"
## The "restartCalculation" must be set to TRUE
permutationResult <- runPermutation(
methylKitData = demoForTransgenerationalAnalysis, # multi-generational dataset
type = "tiles", # Only tiles
outputDir = "test_restart", # RDS files are created
runObservationAnalysis = TRUE,
nbrCores = 1, # Number of cores used
nbrPermutations = 2, # Should be much higher for a
# real analysis
vSeed = 212201, # Ensure reproducible results
restartCalculation = TRUE)
The permutation analysis needs a list of methylRawList objects as input. A methylRawList is a list of methylRaw objects. The methylRawList and methylRaw objects are defined in the methylKit package.
To create a methylRawList, all samples (cases and controls) from the same generation must be first separately transformed into a methylRaw object. The S4 methylRaw class extends data.frame class and has been created to store raw methylation data. The raw methylation is essentially percent methylation values and read coverage values per base or region.
Excluding the data.frame section, the slots present in the methylRaw class are:
## The list of methylRaw objects for all controls and cases related to F1
f1_list <- list()
f1_list[[1]] <- new("methylRaw",
data.frame(chr = c("chr21", "chr21"),
start = c(9764513, 9764542),
end = c(9764513, 9764542), strand = c("+", "-"),
coverage = c(100, 15), numCs = c(88, 2),
numTs = c(100, 15) - c(88, 2)),
sample.id = "F1_control_01", assembly = "hg19",
context = "CpG", resolution = 'base')
f1_list[[2]] <- new("methylRaw",
data.frame(chr = c("chr21", "chr21"),
start = c(9764513, 9764522),
end = c(9764513, 9764522), strand = c("-", "-"),
coverage = c(38, 21), numCs = c(12, 2),
numTs = c(38, 21) - c(12, 2)),
sample.id = "F1_case_02", assembly = "hg19",
context = "CpG", resolution = 'base')
## The list of methylRaw objects for all controls and cases related to F2
f2_list <- list()
f2_list[[1]] <- new("methylRaw",
data.frame(chr = c("chr21", "chr21"),
start = c(9764514, 9764522),
end = c(9764514, 9764522), strand = c("+", "+"),
coverage = c(40, 30), numCs = c(0, 2),
numTs = c(40, 30) - c(0, 2)),
sample.id = "F2_control_01", assembly = "hg19",
context = "CpG", resolution = 'base')
f2_list[[2]] <- new("methylRaw",
data.frame(chr = c("chr21", "chr21"),
start = c(9764513, 9764533),
end = c(9764513, 9764533), strand = c("+", "-"),
coverage = c(33, 23), numCs = c(12, 1),
numTs = c(33, 23) - c(12, 1)),
sample.id = "F2_case_01", assembly = "hg19",
context = "CpG", resolution = 'base')
## The list to use as input for methylInheritance
final_list <- list()
## The methylRawList for F1 - the first generation is on the first position
final_list[[1]] <- new("methylRawList", f1_list, treatment = c(0,1))
## The methylRawList for F2 - the second generation is on the second position
final_list[[2]] <- new("methylRawList", f2_list, treatment = c(0,1))
## A list of methylRawList ready for methylInheritance
final_list
## [[1]]
## methylRawList object with 2 methylRaw objects
##
## methylRaw object with 2 rows
## --------------
## chr start end strand coverage numCs numTs
## 1 chr21 9764513 9764513 + 100 88 12
## 2 chr21 9764542 9764542 - 15 2 13
## --------------
## sample.id: F1_control_01
## assembly: hg19
## context: CpG
## resolution: base
##
## methylRaw object with 2 rows
## --------------
## chr start end strand coverage numCs numTs
## 1 chr21 9764513 9764513 - 38 12 26
## 2 chr21 9764522 9764522 - 21 2 19
## --------------
## sample.id: F1_case_02
## assembly: hg19
## context: CpG
## resolution: base
##
## treament: 0 1
##
## [[2]]
## methylRawList object with 2 methylRaw objects
##
## methylRaw object with 2 rows
## --------------
## chr start end strand coverage numCs numTs
## 1 chr21 9764514 9764514 + 40 0 40
## 2 chr21 9764522 9764522 + 30 2 28
## --------------
## sample.id: F2_control_01
## assembly: hg19
## context: CpG
## resolution: base
##
## methylRaw object with 2 rows
## --------------
## chr start end strand coverage numCs numTs
## 1 chr21 9764513 9764513 + 33 12 21
## 2 chr21 9764533 9764533 - 23 1 22
## --------------
## sample.id: F2_case_01
## assembly: hg19
## context: CpG
## resolution: base
##
## treament: 0 1
Another approach is to transform the files that contain the raw methylation information into a format that can be read by the methylKit methRead function. The methRead function implements methods that enable the creation of methylRawList objects.
Here is one valid file format among many (tab separated):
chrBase chr base strand coverage freqC freqT
1.176367 1 176367 R 29 100.00 0.00
1.176392 1 176392 R 58 100.00 0.00
1.176422 1 176422 R 29 3.45 96.55
1.176552 1 176552 R 58 96.55 3.45
library(methylKit)
## The methylRawList for F1
generation_01 <- methRead(location = list("demo/F1_control_01.txt", "demo/F1_case_01.txt"),
sample.id = list("F1_control_01", "F1_case_01"),
assembly = "hg19", treatment = c(0, 1), context = "CpG")
## The methylRawList for F2
generation_02 <- methRead(location = list("demo/F2_control_01.txt", "demo/F2_case_01.txt"),
sample.id = list("F2_control_01", "F2_case_01"),
assembly = "hg19", treatment = c(0, 1), context = "CpG")
## A list of methylRawList ready for methylInheritance
final_list <- list(generation_01, generation_02)
final_list
## [[1]]
## methylRawList object with 2 methylRaw objects
##
## methylRaw object with 2 rows
## --------------
## chr start end strand coverage numCs numTs
## 1 21 9764513 9764513 + 100 88 12
## 2 21 9764542 9764542 - 15 2 13
## --------------
## sample.id: F1_control_01
## assembly: hg19
## context: CpG
## resolution: base
##
## methylRaw object with 2 rows
## --------------
## chr start end strand coverage numCs numTs
## 1 21 9764513 9764513 - 38 12 26
## 2 21 9764522 9764522 - 21 2 19
## --------------
## sample.id: F1_case_01
## assembly: hg19
## context: CpG
## resolution: base
##
## treament: 0 1
##
## [[2]]
## methylRawList object with 2 methylRaw objects
##
## methylRaw object with 2 rows
## --------------
## chr start end strand coverage numCs numTs
## 1 21 9764514 9764514 + 40 0 40
## 2 21 9764522 9764522 + 30 2 28
## --------------
## sample.id: F2_control_01
## assembly: hg19
## context: CpG
## resolution: base
##
## methylRaw object with 2 rows
## --------------
## chr start end strand coverage numCs numTs
## 1 21 9764513 9764513 + 33 12 21
## 2 21 9764533 9764533 - 23 1 22
## --------------
## sample.id: F2_case_01
## assembly: hg19
## context: CpG
## resolution: base
##
## treament: 0 1
More information about methRead function can be found in the documentation of the methylKit package.
We thank Marie Pier Scott-Boyer for her advice on the vignette content.
Here is the output of sessionInfo() on the system on which this document was compiled:
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.5-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.5-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] methylInheritance_1.0.0 methylKit_1.2.0 GenomicRanges_1.28.0
## [4] GenomeInfoDb_1.12.0 IRanges_2.10.0 S4Vectors_0.14.0
## [7] BiocGenerics_0.22.0 knitr_1.15.1 BiocStyle_2.4.0
##
## loaded via a namespace (and not attached):
## [1] SummarizedExperiment_1.6.0 qvalue_2.8.0
## [3] gtools_3.5.0 reshape2_1.4.2
## [5] splines_3.4.0 lattice_0.20-35
## [7] colorspace_1.3-2 snow_0.4-2
## [9] htmltools_0.3.5 rtracklayer_1.36.0
## [11] yaml_2.1.14 XML_3.98-1.6
## [13] R.oo_1.21.0 R.utils_2.5.0
## [15] BiocParallel_1.10.0 fastseg_1.22.0
## [17] matrixStats_0.52.2 GenomeInfoDbData_0.99.0
## [19] plyr_1.8.4 stringr_1.2.0
## [21] zlibbioc_1.22.0 Biostrings_2.44.0
## [23] munsell_0.4.3 gtable_0.2.0
## [25] R.methodsS3_1.7.1 codetools_0.2-15
## [27] coda_0.19-1 evaluate_0.10
## [29] labeling_0.3 Biobase_2.36.0
## [31] Rcpp_0.12.10 scales_0.4.1
## [33] backports_1.0.5 limma_3.32.0
## [35] DelayedArray_0.2.0 XVector_0.16.0
## [37] Rsamtools_1.28.0 gridExtra_2.2.1
## [39] ggplot2_2.2.1 digest_0.6.12
## [41] stringi_1.1.5 numDeriv_2016.8-1
## [43] grid_3.4.0 rprojroot_1.2
## [45] tools_3.4.0 bitops_1.0-6
## [47] bbmle_1.0.19 magrittr_1.5
## [49] lazyeval_0.2.0 RCurl_1.95-4.8
## [51] tibble_1.3.0 MASS_7.3-47
## [53] Matrix_1.2-9 data.table_1.10.4
## [55] rmarkdown_1.4 emdbook_1.3.9
## [57] mclust_5.2.3 GenomicAlignments_1.12.0
## [59] compiler_3.4.0
Legendre, Pierre, and Louis Legendre. 1998. Numerical ecology. Amsterdam: Elsevier Science BV.