GEOexplorer

Guy Hunt

August 27, 2021

Introduction

GEOexplorer is a Shiny app that enables exploratory data analysis and differential gene expression analysis to be performed on microarray gene expression GEO series datasets held in the GEO database. The outputs are both non-interactive and interactive visualisations which enable users to explore the results. The development of GEOexplorer was made possible because of the excellent code provided by GEO2R (https: //www.ncbi.nlm.nih.gov/geo/geo2r/).

Installation

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("GEOexplorer")

Or GEOexplorer can be installed from GitHub

if (!requireNamespace("devtools", quietly = TRUE))
  install.packages("devtools")

devtools::install_github("guypwhunt/GEOexplorer")

Getting Started using the GEOexplorer Shiny App

Getting started with the GEOexplorer Shiny App is relatively easy and can be performed in a number of steps.

Step 1: Load the package

library(GEOexplorer)
#> Loading required package: shiny
#> Loading required package: limma
#> Loading required package: Biobase
#> Loading required package: BiocGenerics
#> 
#> Attaching package: 'BiocGenerics'
#> The following object is masked from 'package:limma':
#> 
#>     plotMA
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#>     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#>     pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
#>     tapply, union, unique, unsplit, which.max, which.min
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> Loading required package: plotly
#> Loading required package: ggplot2
#> 
#> Attaching package: 'plotly'
#> The following object is masked from 'package:ggplot2':
#> 
#>     last_plot
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> The following object is masked from 'package:graphics':
#> 
#>     layout
#> Loading required package: shinyBS
#> Setting options('download.file.method.GEOquery'='auto')
#> Setting options('GEOquery.inmemory.gpl'=FALSE)

Step 2: Launch the Shiny App in a browser.

loadApp()

Performing Exploratory Data Analysis

Step 1: After loading the Shiny app, input a microarray GEO series accession code (with the format GSExxxxx) into “GEO accession code” field, as per the image 1.

Step 2: Select the platform you wish to analyse from the drop down, as per the image 1.

Step 3: Select if you want log transformation to be applied, not applied or for GEOexplorer to determine if log transformation should be applied to the expression data, as per the image 1.

Image 1:

Step 4: Select if you want missing data to be estimated using KNN imputation, as per the image 2.

Step 5: Click analyse to perform exploratory data analysis, as per the image 2.

Image 2:

Step 6: Explore the experiment and expression data in the Experiment Information, Column Details and Dataset tabs, as per the image 3.

Image 3:

Step 7: View the exploratory data analysis in the Exploratory Data Analysis tab, as per the image 4.

Step 8: Click on the tabs to view the interactive exploratory data analysis visualisations, as per the image 4. These visualisations give insights into the trends within the expression data such as the experimental conditions with similar gene expression profiles. This information is useful when performing differential gene expression analysis.

Image 4:

Performing Differential Gene Expression Analysis

Step 1: After performing exploratory data analysis, click on the Differential Gene Expression Analysis tab, as per the image 1.

Image 1:

Step 2: Click on the Set Parameters tab, as per the image 2.

Step 3: Assign each experimental condition into group 1, group 2 or N/A, as per the image 2. Experimental condition assigned to N/A will not be included in differential gene expression analysis whilst those assigned to group 2 will be compared to those assigned to group 1.

Image 2:

Step 4: Select the adjustment to P value from the drop down, as per the image 3.

Step 5: Select whether to apply limma precision weights, as per the image 3. The limma precision weights improve the accuracy of differential gene expression analysis when a strong mean-variance trend is present as can be identified from the Mean-Variance Plot tab.

Step 6: Select whether to force normalisation, as per the image 3. Forcing normalisation is advisable if the experimental conditions are not median centred as can be identified from the Box-and-Whisper Plot, Expression Density Plot and 3D Expression Density Plot tabs.

Step 7: Select the significance cut off, as per the image 3. The cut off will be used to identify the genes that are under-expressed and the genes that are over-expressed between the two groups.

Step 8: Click analyse to perform differential gene expression analysis, as per the image 3.

Image 3:

Step 9: Explore the results of differential gene expression analysis in the subsequent tabs, as per the image 4.

Image 4:

Video Demonstration of GEOexplorer

A video demonstrating how to use GEOexplorer’s user interface is available on the following link (https://youtu.be/8R8yqMlPCVM).

Conclusion

The GEOexplorer package provides an easy way to perform exploratory data analysis and differential gene expression analysis on microarray gene expression GEO series datasets and provide the outputs as interactive and non-interactive visualisations.

Reporting problems or bugs

If you run into problems using GEOexplorer, the Bioconductor Support site is a good first place to ask for help. If you are convinced that there is a bug in GEOexplorer, feel free to submit an issue on the GEOexplorer github site. Please include the GEO accession code that errors and the operating system

Session info

The following package and versions were used in the production of this vignette.

#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] GEOexplorer_1.0.0   shinyBS_0.61        plotly_4.10.0      
#> [4] ggplot2_3.3.5       Biobase_2.54.0      BiocGenerics_0.40.0
#> [7] limma_3.50.0        shiny_1.7.1        
#> 
#> loaded via a namespace (and not attached):
#>  [1] shinyHeatmaply_0.2.0 fontawesome_0.2.2    shinybusy_0.2.2     
#>  [4] webshot_0.5.2        RColorBrewer_1.1-2   httr_1.4.2          
#>  [7] tools_4.1.1          bslib_0.3.1          utf8_1.2.2          
#> [10] R6_2.5.1             DT_0.19              DBI_1.1.1           
#> [13] lazyeval_0.2.2       colorspace_2.0-2     withr_2.4.2         
#> [16] sp_1.4-5             tidyselect_1.1.1     gridExtra_2.3       
#> [19] compiler_4.1.1       factoextra_1.0.7     TSP_1.1-11          
#> [22] xml2_1.3.2           sass_0.4.0           scales_1.1.1        
#> [25] readr_2.0.2          askpass_1.1          stringr_1.4.0       
#> [28] digest_0.6.28        foreign_0.8-81       rmarkdown_2.11      
#> [31] GEOquery_2.62.0      pkgconfig_2.0.3      htmltools_0.5.2     
#> [34] umap_0.2.7.0         fastmap_1.1.0        htmlwidgets_1.5.4   
#> [37] rlang_0.4.12         readxl_1.3.1         impute_1.68.0       
#> [40] jquerylib_0.1.4      generics_0.1.1       jsonlite_1.7.2      
#> [43] crosstalk_1.1.1      dendextend_1.15.1    dplyr_1.0.7         
#> [46] magrittr_2.0.1       Matrix_1.3-4         Rcpp_1.0.7          
#> [49] munsell_0.5.0        fansi_0.5.0          reticulate_1.22     
#> [52] viridis_0.6.2        lifecycle_1.0.1      stringi_1.7.5       
#> [55] yaml_2.2.1           grid_4.1.1           maptools_1.1-2      
#> [58] promises_1.2.0.1     ggrepel_0.9.1        crayon_1.4.1        
#> [61] lattice_0.20-45      hms_1.1.1            knitr_1.36          
#> [64] pillar_1.6.4         codetools_0.2-18     glue_1.4.2          
#> [67] evaluate_0.14        data.table_1.14.2    vctrs_0.3.8         
#> [70] png_0.1-7            tzdb_0.1.2           httpuv_1.6.3        
#> [73] foreach_1.5.1        cellranger_1.1.0     gtable_0.3.0        
#> [76] openssl_1.4.5        purrr_0.3.4          tidyr_1.1.4         
#> [79] heatmaply_1.3.0      assertthat_0.2.1     xfun_0.27           
#> [82] mime_0.12            xtable_1.8-4         RSpectra_0.16-0     
#> [85] later_1.3.0          viridisLite_0.4.0    seriation_1.3.1     
#> [88] tibble_3.1.5         pheatmap_1.0.12      iterators_1.0.13    
#> [91] registry_0.5-1       ellipsis_0.3.2