1 Introduction

Illumina Infinium HumanMethylation 450K BeadChip assay has become a standard tool to analyse methylation in human samples. Developed in 2011, it has already been used in projects such as The Cancer Genome Atlas (TCGA). Their 450.000 probes provide a good overall image of the methylation state of the genome, being one of the reasons of its success.

Given its complex design¹, many Bioconductor packages have been developed to assess normalization and pre-processing issues (e.g. minfi (Aryee et al. 2014) or lumi (Du, Kibbe, and Lin 2008)). In addition, these packages can detect differentially methylated probes (DMPs) and differentially methylated regions (DMRs). However, the interfaces are not very intuitive and several scripting steps are usually required.

MEAL aims to facilitate the analysis of Illumina Methylation 450K chips. DMPs and DMRs detection algorithms are included, along with new plotting facilities. Besides, two wrapper functions allow performing whole methylome analysis or region analysis. Two additional features are adjustment of models for SNPs and tools to manage genomic-like variables.

1.1 Data types

MEAL implements three new classes: MethylationSet, AnalysisResults and AnalysisRegionResults. MethylationSet is a class derived from Bioconductor eSet. All the information of the experiment is contained in this class: a beta values matrix, phenotypic information and annotation.

AnalysisResults and AnalysisRegionResults contain the results of whole methylome analysis and region analysis respectively as well as the raw data of the most significant features. With this design, the analyses and the reporting are split. Given that all the plots and the results can be obtained from these objects, we can do the analyses, store the objects and then do the report.

1.2 Example data

minfiData and IlluminaHumanMethylation450kanno.ilmn12.hg19 packages are required in order to run the examples of this vignette. minfiData contains MsetEx, a MethylSet from the minfi package.

library(MEAL)
library(MultiDataSet)
library(minfiData)
library(GenomicRanges)

2 Read data

MEAL does not contain preprocessing capabilities so these steps should be done previously. As a result, before analysing MsetEx, probes that does not measure CpGs should be previously filtered. The next code uses a minfi function to remove not CpGs probes.

MsetExFilt <- dropMethylationLoci(MsetEx)

MethylationSet construction requires a beta values matrix and a data.frame with the phenotypes. The following code extract both objects from a minfi object. To speed up the example, only 30.000 probes will be used. In order to obtain reproducible results, the seed used for random number will be set to 0.

set.seed(0)
betas <- getBeta(MsetExFilt)
betas <- betas[sample(1:nrow(betas), 30000), ]
phenotypes <- pData(MsetExFilt)

With prepareMethylationSet, a MethylationSet is obtained. It contains only the samples having beta values and phenotypes and the probes containing annotation.

set <- prepareMethylationSet(matrix = betas, phenotypes = phenotypes, 
                                    annotation = "IlluminaHumanMethylation450kanno.ilmn12.hg19")
set

## MethylationSet (storageMode: lockedEnvironment)
## assayData: 29999 features, 6 samples 
##   element names: meth 
## protocolData: none
## phenoData
##   rowNames: 5723646052_R02C02 5723646052_R04C01 ...
##     5723646053_R06C02 (6 total)
##   varLabels: Sample_Name Sample_Well ... filenames (13 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: cg17501828 cg15560884 ... cg14273923 (29999 total)
##   fvarLabels: chromosome position ... DHS (17 total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation: IlluminaHumanMethylation450kanno.ilmn12.hg19

prepareMethylationSet is the main entry point to use the package and it is worth describing its arguments and functionalities. The first two arguments (matrix and phenotypes) are the matrix of beta values and the data.frame of phenotypes. It should be noticed that an AnnotatedDataFrame can be used as phenotypes (during the construction of the object, phenotypes data.frame is coerced to an AnnotatedDataFrame). In any case, beta values matrix contains CpGs at rows and samples at columns and both must be named. The same applies to phenotypes data.frame but rows must contain samples and columns variables.

annotation can be a character or a data.frame (or an AnnotatedDataFrame). If character, it loads an annotation package and uses it in the object. At the moment, only IlluminaHumanMethylation450kanno.ilmn12.hg19 is supported. The following parameters (chromosome, position, genes and group) allow the specification of a custom annotation. If annotation is a data.frame, these parameters should match column names of the annotation data.frame. Chromosome equals to chromosome name (e.g. chr1) and position to position coordinates. Genes and group are optional and equal to the genes near the probe and to the position of the probe in relation to the gene. Default argument values match names in IlluminaHumanMethylation450kanno.ilmn12.hg19 annotation package. Finally, if annotation is not specified, IlluminaHumanMethylation450kanno.ilmn12.hg19 package is used.

2.1 Compatibility with minfi

minfi objects can be directly used in prepareMethylationSet. If the object contains phenotypes, there is no need to supply it. Otherwise, it will be compulsory.

3 Probe analysis

This section (probe analysis) and the following (region analysis) describe the two main parts of the workflow. Despite they can be run as independent functions, it is recommended to use the general function DAPipeline, which is explained below. DAPipeline performs both analyses and it is designed in a more user-friendly way.

The initial approach when studying methylation was to find differentially methylated probes. The analysis chosen was to fit a linear model for each of the probes, taking into account the variable of interest as well as some covariates. Probes were usually sorted by statistical parameters such as coefficients p-value.

This approach has been implemented in MEAL but with some changes. As was proposed in (Du et al. 2010), M-values (logit2 transformation of beta values) are used to fit the model. In addition, a robust linear regression is used instead of a normal linear regression. This kind of regression is more robust to outliers and can obtain more accurate coefficients.

Before analysing the data, a quick look at the phenotypes will be done:

pData(set)

##                   Sample_Name Sample_Well Sample_Plate Sample_Group
## 5723646052_R02C02    GroupA_3          H5         <NA>       GroupA
## 5723646052_R04C01    GroupA_2          D5         <NA>       GroupA
## 5723646052_R05C02    GroupB_3          C6         <NA>       GroupB
## 5723646053_R04C02    GroupB_1          F7         <NA>       GroupB
## 5723646053_R05C02    GroupA_1          G7         <NA>       GroupA
## 5723646053_R06C02    GroupB_2          H7         <NA>       GroupB
##                   Pool_ID person age sex status  Array      Slide
## 5723646052_R02C02    <NA>    id3  83   M normal R02C02 5723646052
## 5723646052_R04C01    <NA>    id2  58   F normal R04C01 5723646052
## 5723646052_R05C02    <NA>    id3  83   M cancer R05C02 5723646052
## 5723646053_R04C02    <NA>    id1  75   F cancer R04C02 5723646053
## 5723646053_R05C02    <NA>    id1  75   F normal R05C02 5723646053
## 5723646053_R06C02    <NA>    id2  58   F cancer R06C02 5723646053
##                                                  Basename
## 5723646052_R02C02 ../extdata/5723646052/5723646052_R02C02
## 5723646052_R04C01 ../extdata/5723646052/5723646052_R04C01
## 5723646052_R05C02 ../extdata/5723646052/5723646052_R05C02
## 5723646053_R04C02 ../extdata/5723646053/5723646053_R04C02
## 5723646053_R05C02 ../extdata/5723646053/5723646053_R05C02
## 5723646053_R06C02 ../extdata/5723646053/5723646053_R06C02
##                                                 filenames
## 5723646052_R02C02 ../extdata/5723646052/5723646052_R02C02
## 5723646052_R04C01 ../extdata/5723646052/5723646052_R04C01
## 5723646052_R05C02 ../extdata/5723646052/5723646052_R05C02
## 5723646053_R04C02 ../extdata/5723646053/5723646053_R04C02
## 5723646053_R05C02 ../extdata/5723646053/5723646053_R05C02
## 5723646053_R06C02 ../extdata/5723646053/5723646053_R06C02

summary(pData(set))

##  Sample_Name        Sample_Well        Sample_Plate      
##  Length:6           Length:6           Length:6          
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##  Sample_Group         Pool_ID             person               age       
##  Length:6           Length:6           Length:6           Min.   :58.00  
##  Class :character   Class :character   Class :character   1st Qu.:62.25  
##  Mode  :character   Mode  :character   Mode  :character   Median :75.00  
##                                                           Mean   :72.00  
##                                                           3rd Qu.:81.00  
##                                                           Max.   :83.00  
##      sex               status             Array          
##  Length:6           Length:6           Length:6          
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##     Slide             Basename          filenames        
##  Length:6           Length:6           Length:6          
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##

The most interesting variables are person, age, sex and status. Age is numeric but the other ones, that should be factors, are indeed characters. The following analysis will be performed using the variable status, which defines samples of normal or cancer tissues.

Probe analysis can be done with DAProbe, which needs a MethylationSet and a matrix model.

mod <- model.matrix(~as.factor(status), data = pData(set))
proberes <- DAProbe(set = set, model = mod, method = "ls", coefficient = 2)
head(proberes)

##            intercept as.factor(status)normal         sd         t
## cg25937714 0.7303222              -0.5906359 0.01588890 -37.17286
## cg19333963 0.7359011              -0.6532176 0.02438088 -26.79220
## cg25840208 0.6630196              -0.6002307 0.02262368 -26.53108
## cg15246805 0.4097909               0.3790472 0.01433704  26.43831
## cg21502551 0.2843468               0.4908527 0.01866499  26.29804
## cg20450979 0.4797766              -0.4149029 0.01607160 -25.81590
##                 P.Value   adj.P.Val chromosome  position
## cg25937714 1.358420e-07 0.004075123       chr3  44036492
## cg19333963 7.594706e-07 0.004213720      chr19   1467979
## cg25840208 7.995178e-07 0.004213720       chr2  31457043
## cg15246805 8.143456e-07 0.004213720       chr6 133584189
## cg21502551 8.373910e-07 0.004213720      chr13  97153748
## cg20450979 9.227396e-07 0.004213720       chr4 176923542
##                              genes                       group
## cg25937714                                                    
## cg19333963                    APC2                        Body
## cg25840208               EHD3;EHD3               1stExon;5'UTR
## cg15246805          EYA4;EYA4;EYA4           5'UTR;5'UTR;5'UTR
## cg21502551                  HS6ST3                        Body
## cg20450979 GPM6A;GPM6A;GPM6A;GPM6A 5'UTR;5'UTR;1stExon;1stExon

DAProbe returns a data.frame with the results of the linear analysis. In the example, method is set to “ls” (normal linear regression) in order to speed the tutorial, but default is robust regression. Coefficient indicates the coefficients of the model matrix whose results will be returned. If coefficient is a vector, a list of data.frames is returned.

4 Region analysis

MEAL includes three region analysis algorithms: bumphunter (Jaffe et al. 2012), DMRcate (Peters et al. 2015) and blockFinder. More information about these methods can be found at the corresponding packages.

DARegion needs a MethylationSet and a matrix model (the same that DAProbe):

regionres <- DARegion(set = set, model = mod, coefficient = 2)
names(regionres)

## [1] "bumphunter"  "blockFinder" "DMRcate"

head(regionres$bumphunter)

##         chr     start       end      value     area cluster indexStart
## 12037  chr6  29521023  29521756 -0.3571560 2.142936   21494      23809
## 12206  chr6 118228078 118228871 -0.4881543 1.464463   22512      25102
## 11196 chr20  61340107  61340542 -0.4514794 1.354438   16372      18133
## 11914  chr5 168727752 168728076 -0.4453066 1.335920   20872      23101
## 11893  chr5 146258181 146258546 -0.3237553 1.295021   20710      22928
## 9311  chr11  30607068  30607360 -0.6022631 1.204526    4439       4905
##       indexEnd L clusterL
## 12037    23814 6        6
## 12206    25104 3        3
## 11196    18135 3        3
## 11914    23103 3        3
## 11893    22931 4        4
## 9311      4906 2        2

head(regionres$blockFinder)

##       chr     start       end      value     area cluster indexStart
## 210  chr6  31904709  32411670  0.1379759 6.484868    1207      21656
## 207  chr6  28962770  30017803  0.1132343 4.982309    1207      21373
## 715 chr12 113345598 113417284 -1.3327353 3.998206    2228       6498
## 490 chr15  25230200  25513277  0.1477724 3.546538    2466       8428
## 35   chr1 152538857 153068236  0.2331429 3.497144     165       1690
## 212  chr6  33037444  33232191  0.1478054 3.103914    1207      21783
##     indexEnd  L clusterL
## 210    21734 47      250
## 207    21447 44      250
## 715     6500  3       12
## 490     8451 24       27
## 35      1704 15       33
## 212    21822 21      250

head(regionres$DMRcate)

##                          coord no.cpgs       minfdr     Stouffer
## 1369    chr6:29521023-29521756       6 2.087206e-41 2.858691e-07
## 208  chr10:118031870-118033115       4 3.848536e-54 8.255219e-07
## 1450  chr6:118228078-118228871       3 2.120233e-80 5.655719e-06
## 1324  chr5:168727752-168728076       3 5.861509e-53 8.606307e-06
## 612    chr16:51184205-51185110       3 2.247997e-25 4.416790e-05
## 750    chr19:12305854-12306303       3 9.694360e-34 5.004852e-05
##       maxbetafc meanbetafc
## 1369 -0.5047504 -0.3571560
## 208  -0.6028006 -0.5332195
## 1450 -0.6021535 -0.4881543
## 1324 -0.4638198 -0.4453066
## 612  -0.5757717 -0.3976473
## 750  -0.5706350 -0.3859342

DARegion returns a list of data.frames with the results of the different methods. methods parameter can be used to select the methods desired. If a method is not chosen, a NA value is returned in this position. Because DMRcate uses results from a linear model regression, these results can be computed using DAProbe and passed in the argument proberes.

Bumphunter and blockFinder can calculate p-values for the regions differentially detected using bootstraping. num_permutations arguments sets the number of permutations that will be done to estimate these p-values. By default it is set to 0, because its computation requires a lot of memory and it is time consuming.

Finally, it should be said that blockFinder method has been adapted from minfi package and it needs its annotation package (IlluminaHumanMethylation450kanno.ilmn12.hg19). If another annotation is used, blockFinder use is discouraged.

5 Whole methylome analysis

The first approach when studying methylation changes is to get an overall picture of the methylation state throughout the genome. This kind of analysis is performed by the function DAPipeline. The first analysis will be to evaluate the effect of cancer in methylation. Given that this variable is a character, it should be converted to factor using variable_types:

res <- DAPipeline(set = set, variable_names = "status", variable_types = "categorical", 
                      probe_method = "ls", verbose = FALSE)

## Your contrast returned 2135 individually significant probes. We recommend the default setting of pcutoff in dmrcate().

## Fitting chr1...

## Fitting chr10...

## Fitting chr11...

## Fitting chr12...

## Fitting chr13...

## Fitting chr14...

## Fitting chr15...

## Fitting chr16...

## Fitting chr17...

## Fitting chr18...

## Fitting chr19...

## Fitting chr2...

## Fitting chr20...

## Fitting chr21...

## Fitting chr22...

## Fitting chr3...

## Fitting chr4...

## Fitting chr5...

## Fitting chr6...

## Fitting chr7...

## Fitting chr8...

## Fitting chr9...

## Fitting chrX...

## Fitting chrY...

## Demarcating regions...

## Done!

res

## class: AnalysisResults 
## original class: MethylationSet 
## features(2069): cg25937714 cg19333963 ... cg19712291 cg02219949
## samples(6): 5723646052_R02C02 5723646052_R04C01 ...
##   5723646053_R05C02 5723646053_R06C02
## variables:  status 
## model variables names:  status 
## covariables: none 
## Number of bumps: 12959 
## Number of blocks: NULL 
## Number of regions: 314 
## Number of differentially methylated probes: 2069

DAPipeline generates a AnalysisResults objects containing probe and region analysis results. Most of the parameters of this function are arguments of DAProbe and DARegion.

On the other hand, there are four important parameters in this function: variable_names, variable_types, covariable_names and covariable_types. These parameters define the variables that will be used as active variable (for which the results will be presented) and the variables used as covariates (variables that will enter in the model but not in the results). Class of the variables in set can be changed using variable_types and covariable_types. Available types are categorical (factor), continuous (numeric) and the three genetic models (dominant, recessive and additive).

When variables are defined in this way, the linear model created is additive: all the variables are summed. It is also possible to use a linear model containing interaction and other more complex features using the equation parameter:

complexres <- DAPipeline(set = set, variable_names = c("status", "sex"),
                             variable_types = c("categorical", "categorical"), 
                             probe_method = "ls", num_var = 3, verbose = FALSE,
                             equation = "~ status:sex + status + sex")

## Your contrast returned no individually significant probes. Set pcutoff manually in dmrcate() to return DMRs, but be warned there is an increased risk of Type I errors.

## Fitting chr1...

## Fitting chr10...

## Fitting chr11...

## Fitting chr12...

## Fitting chr13...

## Fitting chr14...

## Fitting chr15...

## Fitting chr16...

## Fitting chr17...

## Fitting chr18...

## Fitting chr19...

## Fitting chr2...

## Fitting chr20...

## Fitting chr21...

## Fitting chr22...

## Fitting chr3...

## Fitting chr4...

## Fitting chr5...

## Fitting chr6...

## Fitting chr7...

## Fitting chr8...

## Fitting chr9...

## Fitting chrX...

## Fitting chrY...

## Your contrast returned 152 individually significant probes. We recommend the default setting of pcutoff in dmrcate().

## Fitting chr1...

## Fitting chr10...

## Fitting chr11...

## Fitting chr12...

## Fitting chr13...

## Fitting chr14...

## Fitting chr15...

## Fitting chr16...

## Fitting chr17...

## Fitting chr18...

## Fitting chr19...

## Fitting chr2...

## Fitting chr20...

## Fitting chr21...

## Fitting chr22...

## Fitting chr3...

## Fitting chr4...

## Fitting chr5...

## Fitting chr6...

## Fitting chr7...

## Fitting chr8...

## Fitting chr9...

## Fitting chrX...

## Fitting chrY...

## Demarcating regions...

## Done!

## Your contrast returned no individually significant probes. Set pcutoff manually in dmrcate() to return DMRs, but be warned there is an increased risk of Type I errors.

## Fitting chr1...

## Fitting chr10...

## Fitting chr11...

## Fitting chr12...

## Fitting chr13...

## Fitting chr14...

## Fitting chr15...

## Fitting chr16...

## Fitting chr17...

## Fitting chr18...

## Fitting chr19...

## Fitting chr2...

## Fitting chr20...

## Fitting chr21...

## Fitting chr22...

## Fitting chr3...

## Fitting chr4...

## Fitting chr5...

## Fitting chr6...

## Fitting chr7...

## Fitting chr8...

## Fitting chr9...

## Fitting chrX...

## Fitting chrY...

complexres

## class: AnalysisResults 
## original class: MethylationSet 
## features(50): cg06340552 cg25840208 ... cg16580737 cg02330121
## samples(6): 5723646052_R02C02 5723646052_R04C01 ...
##   5723646053_R05C02 5723646053_R06C02
## variables:  status, sex 
## model variables names:  statusnormal, sexM, statusnormal:sexM 
## covariables: none 
## Number of bumps: 12742, 12783, 13567 
## Number of blocks: NULL, NULL, NULL 
## Number of regions: 0, 37, 0 
## Number of differentially methylated probes: 0, 0, 0

When using equation, number of active variables must be explicit. They will be selected from the left of the equation to the right. Model matrices are created using model.matrix function, so if we introduce interactions, they will be after the single variables. In our example, status and sex both had two levels. Therefore, a dummy variable is created for each of the samples and the interaction is the third column. Consequently, num_var must be 3.

In addition, it should be noticed that variables used in interactions must be included alone in the equation and must be present in the variable_names arguments. Covariables can still be used by setting them in covariable_names, being added to the linear model.

5.1 Results access

There are several functions that allow to access analysis results:

#Bumphunter
head(bumps(res)[[1]])

##         chr     start       end      value     area cluster indexStart
## 12037  chr6  29521023  29521756 -0.3571560 2.142936   21494      23809
## 12206  chr6 118228078 118228871 -0.4881543 1.464463   22512      25102
## 11196 chr20  61340107  61340542 -0.4514794 1.354438   16372      18133
## 11914  chr5 168727752 168728076 -0.4453066 1.335920   20872      23101
## 11893  chr5 146258181 146258546 -0.3237553 1.295021   20710      22928
## 9311  chr11  30607068  30607360 -0.6022631 1.204526    4439       4905
##       indexEnd L clusterL
## 12037    23814 6        6
## 12206    25104 3        3
## 11196    18135 3        3
## 11914    23103 3        3
## 11893    22931 4        4
## 9311      4906 2        2

#BlockFinder
head(blocks(res)[[1]])

## [1] NA

#DMRcate
head(dmrCate(res)[[1]])

##                          coord no.cpgs       minfdr     Stouffer
## 1369    chr6:29521023-29521756       6 2.087206e-41 2.858691e-07
## 208  chr10:118031870-118033115       4 3.848536e-54 8.255219e-07
## 1450  chr6:118228078-118228871       3 2.120233e-80 5.655719e-06
## 1324  chr5:168727752-168728076       3 5.861509e-53 8.606307e-06
## 612    chr16:51184205-51185110       3 2.247997e-25 4.416790e-05
## 750    chr19:12305854-12306303       3 9.694360e-34 5.004852e-05
##       maxbetafc meanbetafc
## 1369 -0.5047504 -0.3571560
## 208  -0.6028006 -0.5332195
## 1450 -0.6021535 -0.4881543
## 1324 -0.4638198 -0.4453066
## 612  -0.5757717 -0.3976473
## 750  -0.5706350 -0.3859342

#Probe
head(probeResults(res)[[1]])

## [1] 0.7303222 0.7359011 0.6630196 0.4097909 0.2843468 0.4797766

#Region
names(regionResults(res))

## [1] "DMRCate"     "Bumphunter"  "BlockFinder"

All these functions return a list, even if it contains only one data.frame. regionResults contains a list with the results of the three per region methods. Export of the results is simplified with the function exportResults.

exportResults(res, dir = "./results")

This function creates csv files with all the results. If more than one variable is present, a subfolder for each variable is created.

5.2 Plotting

AnalysisResults incorporates many plotting facilities. plotFeature plots the beta values distribution of a CpG. plotFeature accepts a number with the CpG index or character with the name of a CpG.

plotFeature(res, 1)

## Warning: Ignoring unknown aesthetics: y

Figure 1: Beta values of cg25937714 split by cancer status

Probe results can be used to plot a QQ-plot and a Manhattan plot. In the second one, CpGs inside a range can be highlighted by passing a GenomicRanges object with the range of interest.

plotQQ(res)

Figure 2: QQplot of the analysis.

range <- GRanges(seqnames = Rle("chr1"), 
                                ranges = IRanges(1000000, end = 10000000))
plotEWAS(res, range = range)

Figure 3: Manhattan plot with the CpGs of the range highlighted

In figure 3, the red line of the Manhattan plot is the significance threshold using Bonferroni.

6 Range analysis

A study of a region can be performed with DARegionAnalysis. Options are very similar to that of DAPipeline. In this situation, status will be the active variable and age will be a covariable. A GenomicRange must be supplied in order to delimit the region.

range <- GRanges(seqnames = Rle("chr12"), 
                                ranges = IRanges(70000000, end = 80000000))
region <- DARegionAnalysis(set = set, variable_names = "status", 
                               variable_types = "categorical", 
                               covariable_names = "age", range = range, 
                               verbose = FALSE)

## Your contrast returned 6 individually significant probes; a small but real effect. Consider manually setting the value of pcutoff to return more DMRs, but be warned that doing this increases the risk of Type I errors.

## Fitting chr12...

## Demarcating regions...

## Done!

## 'nperm' >= set of all permutations: complete enumeration.

## Set of permutations < 'minperm'. Generating entire set.

region

## class: AnalysisRegionResults 
## original class: MethylationSet 
## features(44): cg11054631 cg04609875 ... cg01585322 cg17278249
## samples(6): 5723646052_R02C02 5723646052_R04C01 ...
##   5723646053_R05C02 5723646053_R06C02
## variables:  status 
## model variables names:  status 
## covariables: age 
## Number of bumps: 25 
## Number of blocks: 0 
## Number of regions: 0 
## Number of differentially methylated probes: 6 
## Relevant snps(0):
## Snps Variance:  NA 
## Range:
##  Chr:  chr12     start:  70000000    end:  80000000 
## R2:  0.471 
## RDA P-value:  0.1 
## Region P-value:  0.353 
## Global R2:  0.419

DARegionAnalysis generates a AnalysisRegionResults, an heir of AnalysisResults, so they share getter functions.

#Bumphunter
head(bumps(region)[[1]])

##      chr    start      end      value      area cluster indexStart
## 25 chr12 72667493 72667493 -0.4726047 0.4726047      20         20
## 18 chr12 77756101 77756101  0.3099037 0.3099037      37         37
## 19 chr12 78411737 78411737  0.2162121 0.2162121      38         38
## 22 chr12 78883559 78883559  0.2057113 0.2057113      41         41
## 20 chr12 78637464 78637464  0.2038255 0.2038255      39         39
## 10 chr12 72335228 72335228  0.1988141 0.1988141      17         17
##    indexEnd L clusterL
## 25       20 1        1
## 18       37 1        1
## 19       38 1        1
## 22       41 1        1
## 20       39 1        1
## 10       17 1        1

#BlockFinder
head(blocks(region)[[1]])

## data frame with 0 columns and 0 rows

#DMRcate
head(dmrCate(region)[[1]])

## [1] coord      no.cpgs    minfdr     Stouffer   maxbetafc  meanbetafc
## <0 rows> (or 0-length row.names)

#Probe
head(probeResults(region)[[1]])

## [1]  0.79738780 -0.05254833  0.38373112  0.79841610  0.10409316  0.46457824

#Region
names(regionResults(region))

## [1] "DMRCate"     "Bumphunter"  "BlockFinder"

6.1 Plotting

Distribution of coefficients of the region can be plotted using the plotRegion function. .

plotRegion(region)

Figure 4: Plot of the differential methylation for each CpG of the region.

In figure 4, green points are significant probes (p-value smaller than 0.05) while red points are non significant probes (p-value greater than 0.05). Blue lines are set at 0.05, a minimal difference to be considered differentially methylated.

6.2 Redundancy analysis

To evaluate the correlation of the methylation probes with the phenotype of interest, a redundancy analysis (RDA) is run. RDA is a multivariate analogue of linear regression. In our case, we will fit our matrix of methylation probes to a matrix of variables: \[ Y = A*X + \epsilon \] where Y are the methylation values of the probes of our region, A the matrix used to fit X to Y and X the matrix with the variables to fit. We can transform this equation to: \[ Y = \hat{Y} + Y_r \] where \(\hat{Y}\) is the fitted matrix Y, or the part of Y that can be explained by X and \(Y_r\) is the residual Y, or the part of Y that cannot be explained by X. The variance of the matrix Y can be split between \(\hat{Y}\) and \(Y_r\), and an R² can be computed, representing the amount of variance of Y that is explained by X.

In the print of the results, the R² and the p-value from the RDA analysis are returned. A plot can be done with plotRDA:

plotRDA(region)

Figure 5: RDA plot of the region.

In figure 5, points represent samples grouped by the active variables. Red crosses are the CpGs and CpG names are the CpGs most correlated to the RDA axes.

7 SNPs Data

In our example, SNPs data was not available. An example with SNPs can be found in caseExample vignette.

References

Aryee, Martin J, Andrew E Jaffe, Hector Corrada-Bravo, Christine Ladd-Acosta, Andrew P Feinberg, Kasper D Hansen, and Rafael A Irizarry. 2014. “Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays.” Bioinformatics (Oxford, England) 30 (10): 1363–9. doi:10.1093/bioinformatics/btu049.

Du, Pan, Warren A Kibbe, and Simon M Lin. 2008. “lumi: a pipeline for processing Illumina microarray.” Bioinformatics (Oxford, England) 24 (13): 1547–8. doi:10.1093/bioinformatics/btn224.

Du, Pan, Xiao Zhang, Chiang-Ching Huang, Nadereh Jafari, Warren A Kibbe, Lifang Hou, and Simon M Lin. 2010. “Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis.” BMC Bioinformatics 11 (January): 587. doi:10.1186/1471-2105-11-587.

Jaffe, Andrew E, Peter Murakami, Hwajin Lee, Jeffrey T Leek, M Daniele Fallin, Andrew P Feinberg, and Rafael A Irizarry. 2012. “Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies.” International Journal of Epidemiology 41 (1): 200–209. doi:10.1093/ije/dyr238.

Peters, Timothy, Michael Buckley, Aaron Statham, Ruth Pidsley, Katherine Samaras, Reginald Lord, Susan Clark, and Peter Molloy. 2015. “De novo identification of differentially methylated regions in the human genome.” Epigenetics & Chromatin 8 (1): 6. doi:10.1186/1756-8935-8-6.

Introduction to MEAL

Carlos Ruiz, Carles Hernández, Juan R. González

Center for Research in Environmental Epidemiology (CREAL), Barcelona, Spain
Bioinformatics Research Group in Epidemiolgy
(http://www.creal.cat/brge.htm)

2016-11-16

Package version: MEAL 1.4.2

Contents

1 Introduction

1.1 Data types

1.2 Example data

2 Read data

2.1 Compatibility with minfi

3 Probe analysis

4 Region analysis

5 Whole methylome analysis

5.1 Results access

5.2 Plotting

6 Range analysis

6.1 Plotting

6.2 Redundancy analysis

7 SNPs Data

References

Introduction to MEAL

Carlos Ruiz, Carles Hernández, Juan R. González

Center for Research in Environmental Epidemiology (CREAL), Barcelona, Spain Bioinformatics Research Group in Epidemiolgy (http://www.creal.cat/brge.htm)

2016-11-16

Package version: MEAL 1.4.2

Contents

1 Introduction

1.1 Data types

1.2 Example data

2 Read data

2.1 Compatibility with minfi

3 Probe analysis

4 Region analysis

5 Whole methylome analysis

5.1 Results access

5.2 Plotting

6 Range analysis

6.1 Plotting

6.2 Redundancy analysis

7 SNPs Data

References

Center for Research in Environmental Epidemiology (CREAL), Barcelona, Spain
Bioinformatics Research Group in Epidemiolgy
(http://www.creal.cat/brge.htm)