The DMRsegaldata package provides example DNA methylation data for use with the DMRsegal package. This data package contains preprocessed methylation beta values from Illumina HumanMethylation450K arrays, along with associated phenotype information, differentially methylated positions (DMPs), and array type metadata.
The data consists of:
All data has been preprocessed and quality controlled, making it ready for downstream analysis with DMRsegal or other methylation analysis tools.
# Install from Bioconductor (once submitted)
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("DMRsegaldata")
library(DMRsegaldata)
#> Loading required package: ExperimentHub
#> Loading required package: BiocGenerics
#> Loading required package: generics
#>
#> Attaching package: 'generics'
#> The following objects are masked from 'package:base':
#>
#> as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#> setequal, union
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
#> mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#> rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
#> unsplit, which.max, which.min
#> Loading required package: AnnotationHub
#> Loading required package: BiocFileCache
#> Loading required package: dbplyr
#> Registered S3 method overwritten by 'bit64':
#> method from
#> print.bitstring tools
The package provides four main data objects:
The beta object contains DNA methylation beta values for CpG sites across all samples:
# Load the beta values matrix
library(ExperimentHub)
eh <- ExperimentHub()
beta <- eh[["EH10275"]]
#> see ?DMRsegaldata and browseVignettes('DMRsegaldata') for documentation
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cache
# Examine the structure
dim(beta)
#> [1] 384180 20
class(beta)
#> [1] "matrix" "array"
# Preview the first few rows and columns
beta[1:5, 1:5]
#> T1 T2 T3 T4 T5
#> cg16619049 0.31135 0.47515 0.44109 0.55200 0.61299
#> cg23100540 0.15380 0.20779 0.19384 0.32562 0.38765
#> cg18147296 0.59040 0.63569 0.73986 0.76898 0.70022
#> cg13938959 0.76462 0.79294 0.42258 0.64547 0.62924
#> cg12445832 0.67436 0.65939 0.30524 0.48846 0.44575
# Summary statistics
summary(beta[1:100, 1])
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.08348 0.30938 0.54843 0.52316 0.74111 0.94975
Beta values range from 0 to 1, representing the proportion of methylation at each CpG site.
The pheno object contains sample metadata including group assignment, age, and gender:
# Load phenotype data
pheno <- eh[["EH10276"]]
#> see ?DMRsegaldata and browseVignettes('DMRsegaldata') for documentation
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cache
# View the structure
str(pheno)
#> 'data.frame': 20 obs. of 3 variables:
#> $ Sample_Group: chr "cancer" "cancer" "cancer" "cancer" ...
#> $ Age : int 65 48 75 70 55 73 73 61 55 71 ...
#> $ Gender : chr "m" "m" "m" "f" ...
head(pheno)
#> Sample_Group Age Gender
#> T1 cancer 65 m
#> T2 cancer 48 m
#> T3 cancer 75 m
#> T4 cancer 70 f
#> T5 cancer 55 m
#> T6 cancer 73 m
# Summary of sample groups
table(pheno$Sample_Group)
#>
#> cancer normal
#> 10 10
# Summary of gender distribution
table(pheno$Gender)
#>
#> f m
#> 6 14
table(pheno$Sample_Group, pheno$Gender)
#>
#> f m
#> cancer 2 8
#> normal 4 6
# Age distribution
summary(pheno$Age)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 43.0 55.0 69.0 64.6 73.0 89.0
The dmps object contains pre-calculated differentially methylated positions identified using limma:
# Load DMP results
dmps <- eh[["EH10277"]]
#> see ?DMRsegaldata and browseVignettes('DMRsegaldata') for documentation
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cache
# Examine the structure
head(dmps)
#> intercept pval f qval pval_adj
#> cg18146737 -0.34134982 1.536868e-18 31.22780 1.477001e-13 5.904339e-13
#> cg09975620 -0.64428883 8.313945e-18 29.85366 3.462748e-13 1.384240e-12
#> cg27344587 0.21875927 1.080931e-17 29.63565 3.462748e-13 1.384240e-12
#> cg18582992 -0.66539652 1.649412e-17 29.28230 3.784676e-13 1.512931e-12
#> cg24805759 -0.09979448 1.969040e-17 29.13335 3.784676e-13 1.512931e-12
#> cg19961545 0.27802221 2.840323e-17 28.82369 4.549469e-13 1.818659e-12
dim(dmps)
#> [1] 208205 5
# Summary of significance levels
summary(dmps$pval)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.000e+00 8.784e-06 3.253e-04 3.539e-03 3.991e-03 2.710e-02
summary(dmps$pval_adj)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.000e+00 6.483e-05 1.200e-03 7.543e-03 9.819e-03 5.000e-02
# Top 10 most significant DMPs
head(dmps, 10)
#> intercept pval f qval pval_adj
#> cg18146737 -0.34134982 1.536868e-18 31.22780 1.477001e-13 5.904339e-13
#> cg09975620 -0.64428883 8.313945e-18 29.85366 3.462748e-13 1.384240e-12
#> cg27344587 0.21875927 1.080931e-17 29.63565 3.462748e-13 1.384240e-12
#> cg18582992 -0.66539652 1.649412e-17 29.28230 3.784676e-13 1.512931e-12
#> cg24805759 -0.09979448 1.969040e-17 29.13335 3.784676e-13 1.512931e-12
#> cg19961545 0.27802221 2.840323e-17 28.82369 4.549469e-13 1.818659e-12
#> cg00701946 0.36895208 8.706627e-17 27.86431 1.136722e-12 4.544066e-12
#> cg11092487 0.42765306 1.151106e-16 27.62231 1.136722e-12 4.544066e-12
#> cg00024086 -1.54977083 1.182216e-16 27.59914 1.136722e-12 4.544066e-12
#> cg04255230 0.63306256 1.182796e-16 27.59872 1.136722e-12 4.544066e-12
# Count of significant DMPs at different thresholds
sum(dmps$pval_adj < 0.05)
#> [1] 208205
sum(dmps$pval_adj < 0.01)
#> [1] 156698
sum(dmps$pval_adj < 0.001)
#> [1] 100220
The array_type object specifies the Illumina array platform used:
# Load array type information
array_type <- eh[["EH10278"]]
#> see ?DMRsegaldata and browseVignettes('DMRsegaldata') for documentation
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cache
array_type
#> [1] "450K"
Here’s a simple workflow to explore the methylation data:
# Check sample consistency between beta and pheno
all(colnames(beta) == rownames(pheno))
#> [1] TRUE
# Calculate mean methylation per sample
mean_methylation <- colMeans(beta, na.rm = TRUE)
names(mean_methylation) <- colnames(beta)
# Compare mean methylation between groups
cancer_samples <- rownames(pheno)[pheno$Sample_Group == "cancer"]
normal_samples <- rownames(pheno)[pheno$Sample_Group == "normal"]
mean_cancer <- mean(mean_methylation[cancer_samples])
mean_normal <- mean(mean_methylation[normal_samples])
cat("Mean methylation in cancer samples:", round(mean_cancer, 4), "\n")
#> Mean methylation in cancer samples: 0.5199
cat("Mean methylation in normal samples:", round(mean_normal, 4), "\n")
#> Mean methylation in normal samples: 0.5181
# Create a data frame for plotting
plot_data <- data.frame(
Sample = colnames(beta),
MeanMethylation = mean_methylation,
Group = pheno[colnames(beta), "Sample_Group"],
Age = pheno[colnames(beta), "Age"],
Gender = pheno[colnames(beta), "Gender"]
)
# Boxplot comparing groups
boxplot(MeanMethylation ~ Group,
data = plot_data,
main = "Mean Methylation by Sample Group",
xlab = "Sample Group", ylab = "Mean Beta Value",
col = c("cancer" = "lightcoral", "normal" = "lightblue")
)
# Add individual points
stripchart(MeanMethylation ~ Group,
data = plot_data,
vertical = TRUE, method = "jitter",
add = TRUE, pch = 19, col = "black"
)
Figure 1: Distribution of mean methylation by sample group
Let’s look at the methylation levels of the most significant DMPs:
# Get top 3 DMPs
top_dmps <- head(rownames(dmps), 3)
# Set up plotting area
par(mfrow = c(1, 3))
# Plot each DMP
for (cpg in top_dmps) {
if (cpg %in% rownames(beta)) {
cpg_values <- beta[cpg, ]
boxplot(cpg_values ~ pheno$Sample_Group,
main = cpg,
xlab = "Group", ylab = "Beta Value",
col = c("lightcoral", "lightblue")
)
stripchart(cpg_values ~ pheno$Sample_Group,
vertical = TRUE, method = "jitter",
add = TRUE, pch = 19, col = "black"
)
}
}
Figure 2: Methylation levels at top 3 DMPs
par(mfrow = c(1, 1))
# Calculate effect sizes for DMPs
# (mean difference between cancer and normal)
# show only a subset
dmps_subset <- dmps[sample(rownames(dmps), min(1000, nrow(dmps))), , drop = FALSE]
cancer_beta <- beta[rownames(dmps_subset), cancer_samples]
normal_beta <- beta[rownames(dmps_subset), normal_samples]
cancer_beta_means <- rowMeans(cancer_beta, na.rm = TRUE)
normal_beta_means <- rowMeans(normal_beta, na.rm = TRUE)
dmps_subset$delta_beta <- cancer_beta_means - normal_beta_means
# Create volcano plot
plot(dmps_subset$delta_beta, -log10(dmps_subset$pval),
xlab = "Delta Beta (Cancer - Normal)",
ylab = "-log10(p-value)",
main = "Volcano Plot of DMPs",
pch = 19, col = ifelse(dmps_subset$pval_adj < 0.01 & abs(dmps_subset$delta_beta) > 0.2, "red", "grey50"),
cex = 0.5
)
abline(h = -log10(0.05), lty = 2, col = "blue")
abline(v = c(-0.2, 0.2), lty = 2, col = "blue")
legend("topright",
legend = c("FDR < 0.01 && abs(Delta Beta) > 0.2", "Not significant"),
col = c("red", "grey50"), pch = 19
)
Figure 3: Volcano plot of differentially methylated positions
sessionInfo()
#> R version 4.6.0 alpha (2026-04-05 r89794)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] DMRsegaldata_0.99.7 ExperimentHub_3.1.0 AnnotationHub_4.1.0
#> [4] BiocFileCache_3.1.0 dbplyr_2.5.2 BiocGenerics_0.57.0
#> [7] generics_0.1.4 BiocStyle_2.39.0
#>
#> loaded via a namespace (and not attached):
#> [1] rappdirs_0.3.4 sass_0.4.10 BiocVersion_3.23.1
#> [4] RSQLite_2.4.6 digest_0.6.39 magrittr_2.0.5
#> [7] evaluate_1.0.5 bookdown_0.46 fastmap_1.2.0
#> [10] blob_1.3.0 jsonlite_2.0.0 AnnotationDbi_1.73.0
#> [13] DBI_1.3.0 tinytex_0.59 BiocManager_1.30.27
#> [16] httr_1.4.8 purrr_1.2.1 Biostrings_2.79.5
#> [19] httr2_1.2.2 jquerylib_0.1.4 cli_3.6.5
#> [22] crayon_1.5.3 rlang_1.2.0 XVector_0.51.0
#> [25] Biobase_2.71.0 bit64_4.6.0-1 withr_3.0.2
#> [28] cachem_1.1.0 yaml_2.3.12 otel_0.2.0
#> [31] tools_4.6.0 memoise_2.0.1 dplyr_1.2.1
#> [34] filelock_1.0.3 curl_7.0.0 vctrs_0.7.2
#> [37] R6_2.6.1 png_0.1-9 magick_2.9.1
#> [40] stats4_4.6.0 lifecycle_1.0.5 Seqinfo_1.1.0
#> [43] KEGGREST_1.51.1 S4Vectors_0.49.1 IRanges_2.45.0
#> [46] bit_4.6.0 pkgconfig_2.0.3 pillar_1.11.1
#> [49] bslib_0.10.0 Rcpp_1.1.1 glue_1.8.0
#> [52] xfun_0.57 tibble_3.3.1 tidyselect_1.2.1
#> [55] knitr_1.51 htmltools_0.5.9 rmarkdown_2.31
#> [58] compiler_4.6.0