Identifying differentially methylated probes

The first step is the identification of differentially methylated CpGs (DMCs) carried out by function get.diff.meth.

In the Supervised mode, we compare the DNA methylation level of each distal CpG for all samples in Group 1 compared to all samples Group 2, using an unpaired one-tailed t-test. In the Unsupervised mode, the samples of each group (Group 1 and Group 2) are ranked by their DNA methylation beta values for the given probe, and those samples in the lower quintile (20% samples with the lowest methylation levels) of each group are used to identify if the probe is hypomethylated in Group 1 compared to Group 2. The reverse applies for the identification of hypermethylated probes. It is important to highlight that in the Unsupervised mode, each probe selected may be based on a different subset the samples, and thus probe sets from multiple molecular subtypes may be represented. In the Supervised mode, all tests are based on the same set of samples.

The 20% is a parameter to the diff.meth function called minSubgroupFrac. For the unsupervised analysis, this is set to 20% as in Yao et al. (Yao and others 2015), because we wanted to be able to detect a specific molecular subtype among samples; these subtypes often make up only a minority of samples, and 20% was chosen as a lower bound for the purposes of statistical power (high enough sample numbers to yield t-test p-values that could overcome multiple hypotheses corrections, yet low enough to be able to capture changes in individual molecular subtypes occurring in 20% or more of the cases.) This number can be set as an input to the diff.meth function and should be tuned based on sample sizes in individual studies. In the Supervised mode, where the comparison groups are implicit in the sample set and labeled, the minSubgroupFrac parameter is set to 100%. An example would be a cell culture experiment with 5 replicates of the untreated cell line, and another 5 replicates that include an experimental treatment.

To identify hypomethylated DMCs, a one-tailed t-test is used to rule out the null hypothesis: μgroup1 ≥ μgroup2, where μgroup1 is the mean methylation within the lowest group 1 quintile (or another percentile as specified by the minSubgroupFrac parameter) and μgroup2 is the mean within the lowest group 2 quintile. Raw p-values are adjusted for multiple hypothesis testing using the Benjamini-Hochberg method, and probes are selected when they had adjusted p-value less than 0.01 (which can be configured using the pvalue parameter). For additional stringency, probes are only selected if the methylation difference: Δ = μgroup1 − μgroup2 was greater than 0.3. The same method is used to identify hypermethylated DMCs, except we use the upper quintile, and the opposite tail in the t-test is chosen.