Introduction

This document provides an introduction of the R/Biocondcutor ELMER package, which is designed to combine DNA methylation and gene expression data from human tissues to infer multi-level cis-regulatory networks. ELMER uses DNA methylation to identify enhancers, and correlates enhancer state with expression of nearby genes to identify one or more transcriptional targets. Transcription factor (TF) binding site analysis of enhancers is coupled with expression analysis of all TFs to infer upstream regulators. This package can be easily applied to TCGA public available cancer data sets and custom DNA methylation and gene expression data sets.

ELMER analyses have 5 main steps:

  1. Identify distal probes on HM450K.
  2. Identify distal enhancer probes with significantly different DNA methylation level in experiment group (group 1) and control group (group 2).
  3. Identify putative target genes for differentially methylated distal enhancer probes.
  4. Identify enriched motifs for the distal enhancer probes which are significantly differentially methylated and linked to putative target gene.
  5. Identify regulatory TFs whose expression associate with DNA methylation at motifs.

Package workflow

The package workflow is showed in the figure below:

ELMER workflow: ELMER receives as input a DNA methylation object, a gene expression object (both can be either a matrix or a SummarizedExperiment object) and a Genomic Ranges (GRanges) object with distal probes to be used as filter which can be retrieved using the get.feature.probe function. The function createMAE will create a Multi Assay Experiment object keeping only samples that have both DNA methylation and gene expression data. Genes will be mapped to genomic position and annotated using ENSEMBL database, while for probes it will add annotation from (http://zwdzwd.github.io/InfiniumAnnotation). This MAE object will be used as input to the next analysis functions. First, it identifies differentially methylated probes followed by the identification of their nearest genes (10 upstream and 10 downstream) through the get.diff.meth and GetNearGenes functions respectively. For each probe it will verify if any of the nearby genes were affected by its change in the DNA methylation level and a list of gene and probes pairs will be outputted from get.pair function. For the probes in those pairs, it will search for enriched regulatory Transcription Factors motifs with the get.enriched.motif function. Finally, the enriched motifs will be correlate with the level of the transcription factor through the get.TFs function. In the figure green Boxes represents user input data, blue boxes represents output object, orange boxes represents auxiliary pre-computed data and gray boxes are functions.

ELMER workflow: ELMER receives as input a DNA methylation object, a gene expression object (both can be either a matrix or a SummarizedExperiment object) and a Genomic Ranges (GRanges) object with distal probes to be used as filter which can be retrieved using the get.feature.probe function. The function createMAE will create a Multi Assay Experiment object keeping only samples that have both DNA methylation and gene expression data. Genes will be mapped to genomic position and annotated using ENSEMBL database, while for probes it will add annotation from (http://zwdzwd.github.io/InfiniumAnnotation). This MAE object will be used as input to the next analysis functions. First, it identifies differentially methylated probes followed by the identification of their nearest genes (10 upstream and 10 downstream) through the get.diff.meth and GetNearGenes functions respectively. For each probe it will verify if any of the nearby genes were affected by its change in the DNA methylation level and a list of gene and probes pairs will be outputted from get.pair function. For the probes in those pairs, it will search for enriched regulatory Transcription Factors motifs with the get.enriched.motif function. Finally, the enriched motifs will be correlate with the level of the transcription factor through the get.TFs function. In the figure green Boxes represents user input data, blue boxes represents output object, orange boxes represents auxiliary pre-computed data and gray boxes are functions.

Installing and loading ELMER

To install this package from github (development version), start R and enter:

devtools::install_github(repo = "tiagochst/ELMER.data")
devtools::install_github(repo = "tiagochst/ELMER")

To install this package from Bioconductor start R and enter:

source("https://bioconductor.org/biocLite.R")
biocLite("ELMER")

Then, to load ELMER enter:

Citing this work

If you used ELMER package or its results, please cite:

  • Yao, L., Shen, H., Laird, P. W., Farnham, P. J., & Berman, B. P. “Inferring regulatory element landscapes and transcription factor networks from cancer methylomes.” Genome Biol 16 (2015): 105.
  • Yao, Lijing, Benjamin P. Berman, and Peggy J. Farnham. “Demystifying the secret mission of enhancers: linking distal regulatory elements to target genes.” Critical reviews in biochemistry and molecular biology 50.6 (2015): 550-573.
  • Chedraoui Silva, Tiago and Coetzee, Simon G. and Yao, Lijing and Hazelett, Dennis J. and Noushmehr, Houtan and Berman, Benjamin P. “Enhancer Linking by Methylation/Expression Relationships with the R package ELMER version 2” (bioRxiv 148726; doi: https://doi.org/10.1101/148726)

If you get TCGA data using getTCGA function, please cite TCGAbiolinks package:

  • Colaprico A, Silva TC, Olsen C, Garofano L, Cava C, Garolini D, Sabedot T, Malta TM, Pagnotta SM, Castiglioni I, Ceccarelli M, Bontempi G and Noushmehr H. “TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data.” Nucleic acids research (2015): gkv1507.
  • Silva, TC, A Colaprico, C Olsen, F D’Angelo, G Bontempi, M Ceccarelli, and H Noushmehr. 2016. “TCGA Workflow: Analyze Cancer Genomics and Epigenomics Data Using Bioconductor Packages [Version 2; Referees: 1 Approved, 1 Approved with Reservations].” F1000Research 5 (1542). doi:10.12688/f1000research.8923.2.

  • Grossman, Robert L., et al. “Toward a shared vision for cancer genomic data.” New England Journal of Medicine 375.12 (2016): 1109-1112.

If you get use the Graphical user interface, please cite TCGAbiolinksGUI package:

  • Silva, Tiago C. and Colaprico, Antonio and Olsen, Catharina and Bontempi, Gianluca and Ceccarelli, Michele and Berman, Benjamin P. and Noushmehr, Houtan. “TCGAbiolinksGUI: A graphical user interface to analyze cancer molecular and clinical data” (bioRxiv 147496; doi: https://doi.org/10.1101/147496)

Session Info

sessionInfo()
## R version 3.4.3 (2017-11-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.3 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.6-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.6-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] bindrcpp_0.2               ELMER_2.2.7               
## [3] MultiAssayExperiment_1.4.4 BiocStyle_2.6.1           
## [5] dplyr_0.7.4                DT_0.2                    
## [7] ELMER.data_2.2.2          
## 
## loaded via a namespace (and not attached):
##   [1] shinydashboard_0.6.1          R.utils_2.6.0                
##   [3] RSQLite_2.0                   AnnotationDbi_1.40.0         
##   [5] htmlwidgets_0.9               grid_3.4.3                   
##   [7] BiocParallel_1.12.0           devtools_1.13.4              
##   [9] DESeq_1.30.0                  munsell_0.4.3                
##  [11] codetools_0.2-15              withr_2.1.1                  
##  [13] colorspace_1.3-2              BiocInstaller_1.28.0         
##  [15] Biobase_2.38.0                knitr_1.17                   
##  [17] rstudioapi_0.7                stats4_3.4.3                 
##  [19] labeling_0.3                  GenomeInfoDbData_1.0.0       
##  [21] mnormt_1.5-5                  hwriter_1.3.2                
##  [23] KMsurv_0.1-5                  bit64_0.9-7                  
##  [25] rprojroot_1.3-1               downloader_0.4               
##  [27] biovizBase_1.26.0             ggthemes_3.4.0               
##  [29] EDASeq_2.12.0                 R6_2.2.2                     
##  [31] doParallel_1.0.11             GenomeInfoDb_1.14.0          
##  [33] locfit_1.5-9.1                AnnotationFilter_1.2.0       
##  [35] bitops_1.0-6                  reshape_0.8.7                
##  [37] DelayedArray_0.4.1            assertthat_0.2.0             
##  [39] scales_0.5.0                  nnet_7.3-12                  
##  [41] gtable_0.2.0                  sva_3.26.0                   
##  [43] ensembldb_2.2.0               rlang_0.1.6                  
##  [45] genefilter_1.60.0             cmprsk_2.2-7                 
##  [47] GlobalOptions_0.0.12          splines_3.4.3                
##  [49] rtracklayer_1.38.2            lazyeval_0.2.1               
##  [51] acepack_1.4.1                 dichromat_2.0-0              
##  [53] selectr_0.3-1                 broom_0.4.3                  
##  [55] checkmate_1.8.5               yaml_2.1.16                  
##  [57] reshape2_1.4.3                GenomicFeatures_1.30.0       
##  [59] backports_1.1.2               httpuv_1.3.5                 
##  [61] Hmisc_4.1-0                   RMySQL_0.10.13               
##  [63] tools_3.4.3                   psych_1.7.8                  
##  [65] ggplot2_2.2.1                 RColorBrewer_1.1-2           
##  [67] BiocGenerics_0.24.0           Rcpp_0.12.14                 
##  [69] plyr_1.8.4                    base64enc_0.1-3              
##  [71] progress_1.1.2                zlibbioc_1.24.0              
##  [73] purrr_0.2.4                   RCurl_1.95-4.8               
##  [75] prettyunits_1.0.2             ggpubr_0.1.6                 
##  [77] rpart_4.1-11                  GetoptLong_0.1.6             
##  [79] S4Vectors_0.16.0              zoo_1.8-0                    
##  [81] SummarizedExperiment_1.8.1    ggrepel_0.7.0                
##  [83] cluster_2.0.6                 magrittr_1.5                 
##  [85] data.table_1.10.4-3           circlize_0.4.3               
##  [87] survminer_0.4.1               ProtGenerics_1.10.0          
##  [89] matrixStats_0.52.2            aroma.light_3.8.0            
##  [91] hms_0.4.0                     mime_0.5                     
##  [93] evaluate_0.10.1               xtable_1.8-2                 
##  [95] XML_3.98-1.9                  IRanges_2.12.0               
##  [97] gridExtra_2.3                 shape_1.4.3                  
##  [99] testthat_2.0.0                compiler_3.4.3               
## [101] biomaRt_2.34.1                tibble_1.4.1                 
## [103] R.oo_1.21.0                   htmltools_0.3.6              
## [105] mgcv_1.8-22                   Formula_1.2-2                
## [107] tidyr_0.7.2                   geneplotter_1.56.0           
## [109] DBI_0.7                       matlab_1.0.2                 
## [111] ComplexHeatmap_1.17.1         ShortRead_1.36.0             
## [113] Matrix_1.2-12                 readr_1.1.1                  
## [115] R.methodsS3_1.7.1             parallel_3.4.3               
## [117] Gviz_1.22.2                   bindr_0.1                    
## [119] GenomicRanges_1.30.1          pkgconfig_2.0.1              
## [121] km.ci_0.5-2                   GenomicAlignments_1.14.1     
## [123] foreign_0.8-69                plotly_4.7.1                 
## [125] xml2_1.1.1                    roxygen2_6.0.1               
## [127] foreach_1.4.4                 annotate_1.56.1              
## [129] XVector_0.18.0                rvest_0.3.2                  
## [131] stringr_1.2.0                 VariantAnnotation_1.24.4     
## [133] digest_0.6.13                 ConsensusClusterPlus_1.42.0  
## [135] Biostrings_2.46.0             rmarkdown_1.8                
## [137] TCGAbiolinks_2.6.7            survMisc_0.5.4               
## [139] htmlTable_1.11.0              edgeR_3.20.3                 
## [141] curl_3.1                      shiny_1.0.5                  
## [143] Rsamtools_1.30.0              commonmark_1.4               
## [145] rjson_0.2.15                  nlme_3.1-131                 
## [147] jsonlite_1.5                  viridisLite_0.2.0            
## [149] limma_3.34.5                  BSgenome_1.46.0              
## [151] pillar_1.0.1                  lattice_0.20-35              
## [153] httr_1.3.1                    survival_2.41-3              
## [155] interactiveDisplayBase_1.16.0 glue_1.2.0                   
## [157] iterators_1.0.9               bit_1.1-12                   
## [159] stringi_1.1.6                 blob_1.1.0                   
## [161] AnnotationHub_2.10.1          latticeExtra_0.6-28          
## [163] memoise_1.1.0

Bibliography