library(TumourMethData)
#> Warning: replacing previous import 'HDF5Array::h5ls' by 'rhdf5::h5ls' when
#> loading 'TumourMethData'
DNA methylation is a repressive epigenetic modification involving the addition of methyl groups to DNA and occurs almost exclusively at CpG dinucleotides in mammals. Altered DNA methylation plays a profound role in the development and progression of cancer. However, much of our knowledge of DNA methylation in cancer has been garnered from methylation microarrays which measure methylation at only a small subset (generally <1%) of the almost 30 million CpG sites in humans, mostly those located close to gene promoters. Thus, whole genome bisulfite sequenicng (WGBS) studies in tumours which measure DNA methylation across the entire genome provide an invaluable resource for gaining a comprehensive understanding of DNA methylation changes in cancer, especially at regulatory regions located far from genes.
While packages such as curatedTCGAData
provide DNA methylation data generated with microarrays for a range of different cancer types,
TumourMethData
provides a collection of whole genome DNA methylation datasets for several different cancers (primary prostate cancer, prostate cancer metastases, esophageal cancer and rhabdoid tumour at present) as well as matching normal samples where available.
These whole genome methylation datasets are provided as RangedSummarizedExperiments
, facilitating easy download of the data and extraction of methylation values for regions of interest.
Furthermore, RNA-seq transcripts counts are also provided for several of the datasets, enabling thorough analysis of how DNA methylation is associated with transcription and how this relationship is perturbed in cancer.
We can view the available datasets with TumourMethDatasets
.
# Show available methylation datasets
data("TumourMethDatasets", package = "TumourMethData")
print(TumourMethDatasets)
#> dataset_name cancer_type technology genome_build
#> 1 cpgea_wgbs_hg38 prostate WGBS hg38
#> 2 tcga_wgbs_hg38 various WGBS hg38
#> 3 mcrpc_wgbs_hg38 prostate WGBS hg38
#> 4 mcrpc_wgbs_hg38_chr11 prostate WGBS hg38
#> 5 cao_esophageal_wgbs_hg19 esophageal WGBS hg19
#> 6 target_rhabdoid_wgbs_hg19 rhabdoid WGBS hg19
#> number_tumour_samples number_normal_samples wgbs_coverage_available
#> 1 187 187 FALSE
#> 2 39 8 FALSE
#> 3 100 0 TRUE
#> 4 100 0 TRUE
#> 5 10 9 FALSE
#> 6 69 0 FALSE
#> dataset_size_gb transcript_counts_available
#> 1 40.00 TRUE
#> 2 5.40 TRUE
#> 3 16.00 TRUE
#> 4 0.76 TRUE
#> 5 2.00 TRUE
#> 6 4.50 TRUE
#> notes
#> 1
#> 2
#> 3
#> 4 This dataset is a subset of the data in mcrpc_wgbs_hg38 for example purposes
#> 5
#> 6 Methylation values are not as precise as in other datasets. The original \n methylation values were integers between 0 and 10 with separate values for the C and G positions of each CpG site.\n The mean of these values was divided by 10 to produce the methylation values here, \n with CpG sites missing methylation values for either to C or G given an NA value
#> original_publication
#> 1 A genomic and epigenomic atlas of prostate cancer in Asian populations; Nature; 2020
#> 2 DNA methylation loss in late-replicating domains is linked to mitotic cell division; Nature genetics; 2018
#> 3 The DNA methylation landscape of advanced prostate cancer; Nature genetics; 2020
#> 4 The DNA methylation landscape of advanced prostate cancer; Nature genetics; 2020
#> 5 Multi-faceted epigenetic dysregulation of gene expression promotes esophageal squamous cell carcinoma; Nature communications; 2020
#> 6 Genome-Wide Profiles of Extra-cranial Malignant Rhabdoid Tumors Reveal Heterogeneity and Dysregulated Developmental Pathways; Cancer Cell; 2016
We use download_meth_dataset
to download the methylation dataset we are interested in using mcrpc_wgbs_hg38_chr11 as an example.
# Download esophageal WGBS data
mcrpc_wgbs_hg38_chr11 = download_meth_dataset(dataset = "mcrpc_wgbs_hg38_chr11")
#> see ?TumourMethData and browseVignettes('TumourMethData') for documentation
#> loading from cache
#> require("rhdf5")
print(mcrpc_wgbs_hg38_chr11)
#> class: RangedSummarizedExperiment
#> dim: 1333114 100
#> metadata(5): genome is_h5 ref_CpG chrom_sizes descriptive_stats
#> assays(2): beta cov
#> rownames: NULL
#> rowData names(0):
#> colnames(100): DTB_003 DTB_005 ... DTB_265 DTB_266
#> colData names(4): metastatis_site subtype age sex
sessionInfo()
#> R version 4.4.0 beta (2024-04-15 r86425)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] rhdf5_2.48.0 TumourMethData_1.2.0
#> [3] SummarizedExperiment_1.34.0 Biobase_2.64.0
#> [5] GenomicRanges_1.56.0 GenomeInfoDb_1.40.0
#> [7] IRanges_2.38.0 S4Vectors_0.42.0
#> [9] BiocGenerics_0.50.0 MatrixGenerics_1.16.0
#> [11] matrixStats_1.3.0
#>
#> loaded via a namespace (and not attached):
#> [1] KEGGREST_1.44.0 xfun_0.43 bslib_0.7.0
#> [4] lattice_0.22-6 rhdf5filters_1.16.0 vctrs_0.6.5
#> [7] tools_4.4.0 generics_0.1.3 curl_5.2.1
#> [10] AnnotationDbi_1.66.0 tibble_3.2.1 fansi_1.0.6
#> [13] RSQLite_2.3.6 blob_1.2.4 R.oo_1.26.0
#> [16] pkgconfig_2.0.3 Matrix_1.7-0 dbplyr_2.5.0
#> [19] lifecycle_1.0.4 GenomeInfoDbData_1.2.12 compiler_4.4.0
#> [22] Biostrings_2.72.0 htmltools_0.5.8.1 sass_0.4.9
#> [25] yaml_2.3.8 pillar_1.9.0 crayon_1.5.2
#> [28] jquerylib_0.1.4 R.utils_2.12.3 DelayedArray_0.30.0
#> [31] cachem_1.0.8 abind_1.4-5 mime_0.12
#> [34] ExperimentHub_2.12.0 AnnotationHub_3.12.0 tidyselect_1.2.1
#> [37] digest_0.6.35 purrr_1.0.2 dplyr_1.1.4
#> [40] BiocVersion_3.19.1 fastmap_1.1.1 grid_4.4.0
#> [43] cli_3.6.2 SparseArray_1.4.0 magrittr_2.0.3
#> [46] S4Arrays_1.4.0 utf8_1.2.4 withr_3.0.0
#> [49] rappdirs_0.3.3 filelock_1.0.3 UCSC.utils_1.0.0
#> [52] bit64_4.0.5 rmarkdown_2.26 XVector_0.44.0
#> [55] httr_1.4.7 bit_4.0.5 R.methodsS3_1.8.2
#> [58] png_0.1-8 HDF5Array_1.32.0 memoise_2.0.1
#> [61] evaluate_0.23 knitr_1.46 BiocFileCache_2.12.0
#> [64] rlang_1.1.3 glue_1.7.0 DBI_1.2.2
#> [67] BiocManager_1.30.22 jsonlite_1.8.8 Rhdf5lib_1.26.0
#> [70] R6_2.5.1 zlibbioc_1.50.0