Package version: MEAL 1.0.3

Contents

1 Introduction

In this vignette, the workflow to analyse methylation and expression data will be presented. The package also allows considering SNP data since it has been demonstrated the existence of meQTL and eQTL that could confound the results. The example data will be taken from MEALData package (available at BRGE web), a data package that contains a subset from GEO GSE53261. This dataset contains data from untransformed human fibroblasts. The script with the processing steps from the GEO to the final datasets can be found in the folder scripts of MEALData package.

Let’s start by loading the required packages.

library(MEAL)
library(MEALData)
library(GenomicRanges)

There are four objects in MEALData: betavals, pheno, eset and snps. Let’s take a look to each of these objects.

betavals and pheno belongs to the methylation experiment. betavals is a matrix of beta values corresponding to an Infinium HumanMethylation 450K BeadChip and pheno a data.frame with the phenotypes.

betavals[1:5, 1:5]
##              GM02704   GM02706   GM01650   GM01653   GM02640
## cg00000029 0.2864929 0.2069152 0.5525335 0.5176491 0.2791274
## cg00000108 0.9293251 0.9429289 0.8961195 0.9325227 0.9286574
## cg00000109 0.7168331 0.7419590 0.7320755 0.5347986 0.8196660
## cg00000165 0.2935756 0.2475510 0.4311649 0.3937953 0.3349943
## cg00000236 0.9026300 0.9034649 0.8755394 0.8710790 0.8867925
dim(betavals)
## [1] 451448     62
summary(pheno)
##     gender                 source  
##  female:31   Coriell cell line:26  
##  male  :31   other            :36

betavals contains data from 451448 probes and 62 samples. Pheno contains the same number of samples and two variables: gender and source. There is the same number of cells coming from male and female donnors. Source is composed of Coriell cell lines and cells with unknown source.

eset is an ExpressionSet with the expression data corresponding to an Illumina HumanRef-8 v3.0 expression beadchip.

eset
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 21916 features, 64 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: GM00016 GM00022 ... WG3466 (64 total)
##   varLabels: gender
##   varMetadata: labelDescription
## featureData
##   featureNames: ILMN_1343291 ILMN_1651209 ... ILMN_2415949 (21916
##     total)
##   fvarLabels: chr start end genes
##   fvarMetadata: Column Description labelDescription
## experimentData: use 'experimentData(object)'
## Annotation: GPL6883
summary(pData(eset))
##     gender  
##  female:32  
##  male  :32

In this case, there are 21916 expression probes and 64 samples. As it can be seen, there are two more samples in the expression set. This is very common in datasets downloaded from GEO and it is not a problem if the analysis is going to be performed separately. On the other hand, the phenotype data only contains the variable gender and the number of female and male donnors is balanced.

Finally, let’s take a look at snps data. snps is a list containing two objects, the genotypes and the mapping.

str(snps)
## List of 2
##  $ genotypes: num [1:100000, 1:98] 0 0.0952 0 0 0 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:100000] "rs2895914-126_B_R_1514138430" "rs9531337-119_T_F_1502299533" "rs8125266-126_B_F_1502281523" "rs7236498-126_B_R_1502260358" ...
##   .. ..$ : chr [1:98] "WG2275" "WG2276" "WG2274" "WG2066" ...
##  $ map      :'data.frame':   100000 obs. of  5 variables:
##   ..$ Chromosome: chr [1:100000] "12" "13" "20" "18" ...
##   ..$ snp.name  : Factor w/ 1199188 levels "200003","200006",..: 655945 1144767 1094637 986877 392450 931588 678765 822223 980058 429353 ...
##   ..$ position  : num [1:100000] 81254953 35479506 59293703 21765775 7472441 ...
##   ..$ SNP       : Factor w/ 12 levels "[A/C]","[A/G]",..: 7 1 7 7 7 2 1 2 2 7 ...
##   ..$ chromosome: chr [1:100000] "chr12" "chr13" "chr20" "chr18" ...

Genotypes are enclosed in a matrix and can be retrieved by:

snps$genotypes[1:5, 1:5]
##                                 WG2275 WG2276 WG2274 WG2066 WG2065
## rs2895914-126_B_R_1514138430 0.0000000      2      0      2      2
## rs9531337-119_T_F_1502299533 0.0952381      2      2      2      2
## rs8125266-126_B_F_1502281523 0.0000000      0      1      2      2
## rs7236498-126_B_R_1502260358 0.0000000      1      1      1      1
## rs1593933-127_B_R_1501677465 0.0000000      0      0      0      1

Mapping of the SNP probes is can be retrieved by typing snps$map. Here, the original object has been modified to fit the requirements of MEAL objects. Snps annotation data.frame must contain a column called position and a column called chromosome with the name of the chromosome.

head(snps$map)
##                              Chromosome  snp.name  position   SNP
## rs2895914-126_B_R_1514138430         12 rs2895914  81254953 [T/C]
## rs9531337-119_T_F_1502299533         13 rs9531337  35479506 [A/C]
## rs8125266-126_B_F_1502281523         20 rs8125266  59293703 [T/C]
## rs7236498-126_B_R_1502260358         18 rs7236498  21765775 [T/C]
## rs1593933-127_B_R_1501677465          5 rs1593933   7472441 [T/C]
## rs6845251-127_T_R_1501827315          4 rs6845251 184359014 [A/G]
##                              chromosome
## rs2895914-126_B_R_1514138430      chr12
## rs9531337-119_T_F_1502299533      chr13
## rs8125266-126_B_F_1502281523      chr20
## rs7236498-126_B_R_1502260358      chr18
## rs1593933-127_B_R_1501677465       chr5
## rs6845251-127_T_R_1501827315       chr4

All in all, this list can be reused for other analyses, just changing the SNP data but maintaining the structure.

Let’s illustrate the use of this package taking the gender of the donnor as the outcome of interest. Methylation and expression analyses will be described including hypothesis testing and visualitzation.

2 Methylation Analysis

The first analysis will be the methylation analysis. This demonstration will be focused on the analysis workflow while a more extended description of the functions and their arguments can be found at MEAL vignette.

Let’s start the analysis. The first step is to create the MethylationSet with the betas, the phenotype and the snps.

methSet <- prepareMethylationSet(matrix = betavals, phenotypes = pheno)
table(fData(methSet)$chr)
## 
##  chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19  chr2 
## 43479 22631 26764 22703 11338 13942 14145 20605 25956  5515 23896 32322 
## chr20 chr21 chr22  chr3  chr4  chr5  chr6  chr7  chr8  chr9  chrX  chrY 
##  9674  3976  8044 23469 19036 22727 33557 27898 19596  9211 10558   341

As it can be seen, more than 10000 probes are placed in sexual chromosomes and they will be very affected by the gender of the sample. These probes, that we know that will be differentiated and that don’t provide any interesting information, can confound the multitesting analysis by adding a huge ammount of really small p-values that complicate the adjustement. For these reasons, it is very recommended to remove them, specially in this situation that we want to evaluate the effect of gender on methylation.

Rigth now, MEAL doesn’t include a function to automatically perform this filtering. However, it can be easily done with the following steps:

annot <- fData(methSet)
autosomiccpgs <- rownames(annot)[!annot$chr %in% c("chrX", "chrY")]
methSet <- methSet[autosomiccpgs, ]
methSet
## MethylationSet (storageMode: lockedEnvironment)
## assayData: 440484 features, 62 samples 
##   element names: meth 
## protocolData: none
## phenoData
##   rowNames: GM02704 GM02706 ... WG1983 (62 total)
##   varLabels: gender source
##   varMetadata: labelDescription
## featureData
##   featureNames: cg13869341 cg14008030 ... cg02366642 (440484
##     total)
##   fvarLabels: chromosome position ... DHS (17 total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation: IlluminaHumanMethylation450kanno.ilmn12.hg19

Once the object is filtered, we are now ready to run the analysis. DAPipeline are the function in MEAL that performs the analysis. It can work with an ExpressionSet and a MethylationSet. In both cases, for each feature, it computes the effect of the variables set using the parameter variable_names, and it allows the inclusion of adjustment variables with the parameter covariable_names. In addition, if a MethylationSet is used, regional analysis will be performed (such as Bumphunter or DMRcate).

In our case, we will set variable_names to gender, but we could use a vector with all the variables to analyse. Finally, probe_method is set to “ls” (lineal regression) to speed up the demonstration. In order to use robust linear regression, the option is “robust”.

methRes <- DAPipeline(set = methSet, variable_names = "gender", probe_method = "ls")
methRes
## class: AnalysisResults 
## original class: MethylationSet 
## features(251): cg04462931 cg27540865 ... cg08906898 cg20811988
## samples(62): GM02704 GM02706 ... WG1984 WG1983
## variables:  gender 
## model variables names:  gender 
## covariables: none 
## Number of bumps: 226774 
## Number of blocks: NULL 
## Number of regions: 0 
## Number of differentially methylated probes: 251

The analysis generates an AnalysisResults objects with all the results. 239 probes has been detected as differentially methylated. To get an overview of the distribution of these probes, a Manhattan plot can be done:

plotEWAS(methRes)
Figure 1: Manhattan plot of methylation results.

Figure 1: Manhattan plot of methylation results.

In figure 1, we can see that the differentially methylated probes are distributed all around the genome. A quick look to the QQplot can show us if there are problems in the model of the analysis.

plotQQ(methRes)
Figure 2: QQplot of methylation analysis

Figure 2: QQplot of methylation analysis

In figure 2, most of the points are on the theoretical line. The last points are quite separated but this can be due to the high number of differentially methylated probes. Consequently, no further adjustement seems to be required.

3 Integration of data

MEAL incorporates a new class MultiDataSet, designed to handle data from the same samples but from different omic experiments. This class works with eSet derived classes. In this tutorial, we will show how to create a MultiDataSet containing our methylation and SNPs data.

The first step is to create an empty MultiDataSet.

multi <- new("MultiDataSet")

MultiDataSet contains a generic function add.set which allows to add any kind of object to a labeled slot. In addition, there are three predifined functions (add.genexp, add.methy, add.snps) which adds a given object to a given slot. Rigth now, we will add methylation to this new MultiDataSet.

multi <- add.methy(multi, methSet)

add.methy has the same parameters as add.genexp or add.snps: the first parameter is the MultiDataSet where the new object will be added and the second is the object to be added. The difference between these funcions is that add.methy requires a MethylationSet, add.genexp an ExpressionSet and add.snps a SnpSet. Rigth now, snps data is in the form of a list with a SnpMatrix and a annotation, so we should convert it to a SnpSet prior to including in the MultiDataSet.

SnpSet <- new("SnpSet", call = snps$genotypes)
fData(SnpSet) <- snps$map
multi <- add.snps(multi, SnpSet)
## Warning in add.set(object, snpSet, "snps"): The samples of multiDataSet
## are not the same that samples in the new set. Only common samples will be
## retained.

3.1 Region Analysis and use of SNPs

One of the most interesting features of MEAL is the deep analysis of a region of the genome with DARegionAnalysis. This function evaluates the effect of a variable on the pattern of methylation of the whole region (using a redundancy analysis). In addition, if SNPs are included, meQTLs are detected and used to adjust the model.

To show these features, we will look for a region enriched in differentially methylated probes. Thus, we will take a glance to the most significant probes:

head(probeResults(methRes)[[1]], 10)
##             intercept  gendermale         sd         t      P.Value
## cg04462931 0.88654958 -0.32827193 0.07044937 -37.30759 1.526961e-44
## cg27540865 0.01320814  0.30640554 0.18312809  28.03077 3.426455e-37
## cg20926353 0.57644894 -0.38817128 0.09900076 -25.78547 4.273411e-35
## cg03911306 0.93468579 -0.25549244 0.10692549 -25.78332 4.293921e-35
## cg11643285 0.94724727 -0.12214849 0.08543173 -22.57237 8.107946e-32
## cg14095100 0.31521330 -0.19734347 0.08786228 -20.30986 2.788190e-29
## cg08656326 0.33178828 -0.18272313 0.07811972 -19.24051 5.227778e-28
## cg25568337 0.16694882 -0.07327334 0.05241064 -18.22693 9.386919e-27
## cg03691818 0.08622923 -0.05777246 0.09324709 -18.10084 1.354923e-26
## cg09118400 0.79983697 -0.15652139 0.06864569 -16.71849 8.518652e-25
##               adj.P.Val chromosome  position                genes
## cg04462931 6.726017e-39       chr7  64300061                     
## cg27540865 7.546494e-32       chr1 243053934                     
## cg20926353 4.728509e-30       chr9  84303358            TLE1;TLE1
## cg03911306 4.728509e-30       chr3  16648294                 DAZL
## cg11643285 7.142841e-27       chr3  16411667                RFTN1
## cg14095100 2.046922e-24       chr9  84303413            TLE1;TLE1
## cg08656326 3.289647e-23       chr9  84304414                 TLE1
## cg25568337 5.168484e-22       chr6 157098338 ARID1B;ARID1B;ARID1B
## cg03691818 6.631354e-22      chr12  53085038                KRT77
## cg09118400 3.752330e-20       chr7  17402192                     
##                              group
## cg04462931                        
## cg27540865                        
## cg20926353           5'UTR;1stExon
## cg03911306                 TSS1500
## cg11643285                    Body
## cg14095100           5'UTR;1stExon
## cg08656326                 TSS1500
## cg25568337 TSS1500;TSS1500;TSS1500
## cg03691818                    Body
## cg09118400

From the top 10 probes, 2 are placed in chromosome 3 around position 16500000. Therefore, we will analyse a region of 1Mb enclosing these probes.

Before starting the analysis, we will repeat the Manhattan plot but now highlighting the probes of the range. It should be noticed that MEAL uses GRanges objects to define the genomic ranges:

range <- GRanges(seqnames = Rle("chr3"),
                 ranges = IRanges(16000000, end = 17000000))
plotEWAS(methRes, range = range)
Figure 3: Manhattan plot with the cpgs of the range highlighted.

Figure 3: Manhattan plot with the cpgs of the range highlighted.

In figure 3, the most significant cpgs of chromosome 9 are highlighted. There are more highlighted points in chromosome 9 but there are covered by the others.

Now we are ready to perform the analysis. DARegionAnalysis works very simmilar to DAPipeline with the difference that a range is needed. DARegionAnalysis can work with a MethylationSet / ExpressionSet or with a MultiDataSet. If a MultiDataSetis used, SNPs are required and they are included in the analysis. This will be our case:

methRegion <- DARegionAnalysis(set = multi, range = range, variable_names = "gender", probe_method = "ls")
## coercing object of mode  numeric  to SnpMatrix
methRegion
## class: AnalysisRegionResults 
## original class: MethylationSet 
## features(156): cg03624438 cg20216309 ... cg24159436 cg18422055
## samples(61): WG2275 WG2276 ... GM03073 GM00439
## variables:  gender 
## model variables names:  gender 
## covariables: snpPC1, snpPC2, snpPC3, snpPC4, snpPC5, snpPC6 
## Number of bumps: 90 
## Number of blocks: 0 
## Number of regions: 0 
## Number of differentially methylated probes: 7 
## Relevant snps(17): rs4643692-121_B_R_1502228399
##   rs9880307-127_B_R_1502398918 ... rs9825009-127_T_R_1513914912
##   rs11710881-126_T_R_1502098290
## Snps Variance:  0.6982974 
## Range:
##  Chr:  chr3  start:  16000000    end:  17000000 
## Rsquared:  0.075 
## P-value:  0.001

When SNPs are used, first SNPs that are significantly associated to a cpg are selected. On one hand, these SNPs will be used to compute the correlation with the cpgs. On the other, a PCA with the significant SNPs will be calculated and the six first components used as covariables.

The R2 between the cpgs and the SNPs can be retrieved using regionLM and plotted using plotRegionR2. In this example, we will use cpg number 18 in order to show the plot.

regionLM(methRegion)[["cg11610140"]]
##                       gender rs4643692.121_B_R_1502228399 
##                    0.5335796                    0.1062914
plotRegionR2(methRegion, "cg11610140")
Figure 4: R2 of gender and the significant SNP for cpg cg14154784

Figure 4: R2 of gender and the significant SNP for cpg cg14154784

In figure 4, the plot of the R2 shows that gender explains more than 50% of the variability of the cpg while the SNP doesn’t explain almost anything of the variability.

plotRDA(methRegion)
Figure 5: RDA plot for the range analysis

Figure 5: RDA plot for the range analysis

In figure 5, in the RDA plot, male and female samples are clearly separated so gender changes the methylation pattern of the region. The R2 and the p-value quantitave shows this fact.

4 Expression Analysis

The analysis of expression can be performed in the same way than methylation. The main difference is that an ExpressionSet will be used intead of a MethylationSet. The hypothesis to be tested will be the same, so the variable used will be source again.

In order to assure consistent results, ExpressionSet passed to DAPipeline needs to pass some constraints: its fData should contain columns chromosome, start and end. Let’s check out our object:

eset
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 21916 features, 64 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: GM00016 GM00022 ... WG3466 (64 total)
##   varLabels: gender
##   varMetadata: labelDescription
## featureData
##   featureNames: ILMN_1343291 ILMN_1651209 ... ILMN_2415949 (21916
##     total)
##   fvarLabels: chr start end genes
##   fvarMetadata: Column Description labelDescription
## experimentData: use 'experimentData(object)'
## Annotation: GPL6883
fvarLabels(eset)
## [1] "chr"   "start" "end"   "genes"

As it can be seen, we have this data in our object. However, the column containing the chromosomes is called chr insted of chromosome. In the following lines, we will fix this and we will run the analysis.

fData(eset)$chromosome <- fData(eset)$chr
expRes <- DAPipeline(eset, variable_names = "gender", probe_method = "ls")
expRes
## class: AnalysisResults 
## original class: ExpressionSet 
## features(50): ILMN_2382974 ILMN_1789464 ... ILMN_1802923
##   ILMN_1775937
## samples(64): GM00016 GM00022 ... WG3465 WG3466
## variables:  gender 
## model variables names:  gender 
## covariables: none 
## Number of bumps: NULL 
## Number of blocks: NULL 
## Number of regions: NULL 
## Number of differentially expressed probes: 0
plotEWAS(expRes)
Figure 6: Manhattan plot of expression results.

Figure 6: Manhattan plot of expression results.

In figure 6, there are no significant probes after multiple testing. Let’s take a look at the QQplot.

plotQQ(expRes)
Figure 7: QQplot of expression results.

Figure 7: QQplot of expression results.

In figure 7, the QQplot shows a great deflation. This situation ussualy happens when there are hidden variables (such as batch) that we are not taking into account. Given that we don’t have any other variables, we could try surrogate variables analysis (SVA). SVA tries to infer these hidden variables, which we can include in our model. DAPipeline performs this process automatically only by setting sva = TRUE.

expResSVA <- DAPipeline(eset, variable_names = "gender", probe_method = "ls", sva = TRUE)
## Number of significant surrogate variables is:  12 
## Iteration (out of 5 ):1  2  3  4  5
expResSVA
## class: AnalysisResults 
## original class: ExpressionSet 
## features(50): ILMN_2305225 ILMN_1759495 ... ILMN_1700975
##   ILMN_1734194
## samples(64): GM00016 GM00022 ... WG3465 WG3466
## variables:  gender 
## model variables names:  gender 
## covariables: none 
## Number of bumps: NULL 
## Number of blocks: NULL 
## Number of regions: NULL 
## Number of differentially expressed probes: 1
plotEWAS(expResSVA)
Figure 8: Manhattan plot of expression results using SVA.

Figure 8: Manhattan plot of expression results using SVA.

In figure 8, we can see that an expression probe is signicant (after bonferroni correction).

plotQQ(expResSVA)
Figure 9: QQ plot of expression results using SVA.

Figure 9: QQ plot of expression results using SVA.

In figure 9, the QQ-plot show a bit of inflation but the points are close to the theoretical values. Consequently, SVA has improved the analysis and has allowed to detect a differentially expressed probe.

plotVolcano(expResSVA)
Figure 10: Volcano of expression results using SVA.

Figure 10: Volcano of expression results using SVA.

head(probeResults(expResSVA)[[1]])
##              intercept  gendermale         sd         t      P.Value
## ILMN_2305225  9.114120 -0.52570929 0.09640841 -5.452940 1.364261e-06
## ILMN_1759495  9.012309 -0.08977251 0.01873313 -4.792178 1.403132e-05
## ILMN_1789464  7.349925  0.04843598 0.01045058  4.634766 2.412246e-05
## ILMN_1796216  8.854934 -0.13788617 0.03019995 -4.565774 3.052922e-05
## ILMN_1804743  7.685935  0.05865438 0.01325459  4.425213 4.913782e-05
## ILMN_1720844  8.311187 -0.17429851 0.04046218 -4.307690 7.283973e-05
##               adj.P.Val   chr    start      end   genes chromosome
## ILMN_2305225 0.02989915 chr16 57104605 57104654   NDRG4      chr16
## ILMN_1759495 0.15375516  chr6 43598330 43598379    XPO5       chr6
## ILMN_1789464 0.16726960  chr5 79443077 79443126 SERINC5       chr5
## ILMN_1796216 0.16726960 chr14 76318923 76318972   VASH1      chr14
## ILMN_1804743 0.21538091 chr17 55655692 55658193   USP32      chr17
## ILMN_1720844 0.25095186  chr1 84882502 84882551  SSX2IP       chr1

In figure 10, a Volcano plot of the results is shown. Given that none of the points has a log fold change bigger than 1, none of the points is labeled.

It should be noticed that although an ExpressionSet can be the input for DAPipeline and a AnalysisResults object is generated, not all functions will be available. The two functions that might not work properly are plotRegionR2 and plotRegion.

5 Methylation and expression integration

In this last step, the correlation between expression and methylation will be studied. There are two functions in MEAL that can compute the relationship between this two omics data. correlationMethExprs makes pairs of cpg-expression probe and test if they are correlated. On the other hand, multiCorrMethExprs test if, in a chromosomic region, the methylation pattern can explain the expression pattern.

Both functions requires a MultiDataSet, so let’s create a new MultiDataSet with expression and methylation data.

multi2 <- new("MultiDataSet")
multi2 <- add.genexp(multi2, eset)
multi2 <- add.methy(multi2, methSet)
## Warning in add.set(object, methySet, "methylation"): The samples of
## multiDataSet are not the same that samples in the new set. Only common
## samples will be retained.

It should be noticed that add.genexp share the same constraints for ExpressionSet, so only objects with fData containing chromosomes, start and end columns will be accepted. Another interesting feature is that MultiDataSet adding functions ensure that sample names are the same for all sets. To do so, when a new set is added, samples of the new set and samples of the multiset are filtered to only contain the common samples.

5.1 Methylation expression correlation

The function correlationMethExprs performs a linear regression between the methylation and the expression values. First, cpgs are paired to expression probes if the expression probe is in a range of \(\pm\) 250kb from the cpg (this parameter is called flank). After that, for each pair, the following regression is performed: \[ e_{ij} = \beta_i m_{ij} + \epsilon \] where \(e_{ij}\) is the expression value of individual j on pair i expression probe, \(m_{ij}\) the methylation value of individual j on pair i methylation probe and \(\beta_i\) the change of expression produced by the methylation on pair i.

In addition, the effect of other variables can be substracted for the methylation or the expression data. To do this, a linear regression of the data versus the variables is performed: \[ x_{ij} = \sum_{k = 1}^p{\beta_{ik} z_{jk}} + r_{ij} \] where \(x_{ij}\) is the methylation value at cpg i for individual j, \(\beta_{ik}\) is the effect of variable z at cpg, \(z_{jk}\) is the value of variable z for individual j and \(r_{ij}\) is the residual value at cpg i for individual j. Then, the residuals are computed: \[ r_{ij} = x_{ij} - \sum_{k = 1}^p{\beta_{ik} z_{jk}} \] These residuals will be used as the methylation values in the first equation. The same equations can be applied with expression data.

This process is a bit computionally expensive. For this reason, we will use a MultiDataSet with a smaller methylation set with only the differentially methylated probes. Given that cpgs are ordered in the AnalysisResults, we will take the top 20 differentially methylated probes. Using add.methy in a MultiDataSet containing methylation, the methylation data is substituted.

topcpgs <- feats(methRes)[1:20]
minimeth <- methSet[topcpgs, ]
multi3 <- add.methy(multi2, minimeth)
## Warning in add.set(object, methySet, "methylation"): The samples of
## multiDataSet are not the same that samples in the new set. Only common
## samples will be retained.
## Warning in add.set(object, methySet, "methylation"): Slot 'methylation' is
## already set in 'MultiDataSet'. Previous content will be overwritten.
methExprs <- correlationMethExprs(multi3)
## Calculating cpg-expression probe pairs
## Computing residuals
## Computing correlation Methylation-Expression
head(methExprs)
##          cpg        exprs        Beta        se    P.Value adj.P.Val
## 1 cg27540865 ILMN_1660840  0.09878267 0.1438691 0.49515892 0.9509347
## 2 cg03691818 ILMN_1663447 -0.31654861 0.5708385 0.58142236 0.9509347
## 3 cg23778841 ILMN_1671742 -1.91368783 0.8485435 0.02804386 0.9509347
## 4 cg03691818 ILMN_1674250 -0.12242894 0.3048949 0.68954736 0.9509347
## 5 cg11643285 ILMN_1679912  0.48294784 0.4015975 0.23420407 0.9509347
## 6 cg23778841 ILMN_1705515 -0.94025857 0.9759049 0.33945209 0.9509347

The results shows the cpg name, the expression probe, the change of the relationship and se, and the p-value and adjusted p-value (using B&H). However, in our data there are no correlated cpgs-expression probes.

5.2 Methylation expression multivariate correlation

The function multiCorrMethExprs performs a redundancy analysis (RDA) between the methylation and the expression values. This analysis gives us an estimate of the proportion of variance of expression that methylation is able to explain. In our example, we will use the same range used in methylation region analysis.

methExprsRDA <- multiCorrMethExprs(multiset = multi2, range = range)
methExprsRDA
## Call: rda(X = exprsres, Y = methpc)
## 
##               Inertia Proportion Rank
## Total          0.4844     1.0000     
## Constrained    0.2631     0.5432    5
## Unconstrained  0.2213     0.4568    5
## Inertia is variance 
## 
## Eigenvalues for constrained axes:
##    RDA1    RDA2    RDA3    RDA4    RDA5 
## 0.14854 0.08426 0.02401 0.00528 0.00105 
## 
## Eigenvalues for unconstrained axes:
##     PC1     PC2     PC3     PC4     PC5 
## 0.14801 0.04747 0.02018 0.00441 0.00122

multiCorrMethExprs returns a RDA object from vegan package. In the next following lines, some useful functions related to this object will be explained while a more deep explanation can be found in the documentation of vegan package.

rda objects can directly generate a plot:

library(vegan)
## Loading required package: permute
## This is vegan 2.3-2
plot(methExprsRDA)
Figure 11: RDA plot of relationship between expression and methylation

Figure 11: RDA plot of relationship between expression and methylation

In figure 11, blue arrows are methylation probes, black labels expression probes and red points the samples. The closer the points the more simmilar they are. So, expression probe WG1977 and cg14199629 are very simmilar.

The following code gives us quantitative information about the global relathionship. The first code, performs an anove and returns a p.value. In this case, the relathionship is not significant. The other code gives us an estimate of the R square. Again, after taking into account the high number of variables used, the value is very small (0.0903).

anova.cca(methExprsRDA)
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
## 
## Model: rda(X = exprsres, Y = methpc)
##          Df Variance      F Pr(>F)  
## Model    23  0.26314 1.7579  0.019 *
## Residual 34  0.22128                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RsquareAdj(methExprsRDA)
## $r.squared
## [1] 0.5432093
## 
## $adj.r.squared
## [1] 0.2342038

All in all, this analysis shows us that in our region, the expression and methylation levels are not correlated.