Contents

library(healthyControlsPresenceChecker)
#> Setting options('download.file.method.GEOquery'='auto')
#> Setting options('GEOquery.inmemory.gpl'=FALSE)

0.1 Introduction

Bioinformatics projects regarding the analysis of data of patients with cancer or other diseases often require the comparison between the results obtained on patients’ data and results obtained on healthy controls’ data. This step, although crucial, often cannot be performed if the dataset contains no healthy control data. Looking for datasets containing both these kinds of the data can be tedious, and checking a specific dataset can be time-consuming, too. Here we propose a software package that can immedaitely inform the user if data of healthy controls are present or not in a specific dataset.

0.2 Description

healthyControlsPresenceChecker allows users to verify if a specific GEO dataset contains data of healthy controls amongside data of patients.

0.3 Installation via Bioconductor

Once this package will be available on Bioconductor, it will be possibile to install it through the following commands.

Start R (version “4.1”) and enter:

if (!requireNamespace("BiocManager", quietly = TRUE))`
        `install.packages("BiocManager")

BiocManager::install("healthyControlsPresenceChecker")

It will be possible to load the package with the following command:

library("healthyControlsPresenceChecker")

0.4 Usage

The usage of healthyControlsPresenceChecker is very easy. The main function healthyControlsCheck() reads two input arguments: the GEO accession code of the dataset for which the user wants to verify the presence of the healthy controls, and a verbose flag. For example, if the user wants to know if the GSE47407 dataset contains data of healthy controls, she/he can type on a terminal shell within the R environment:

outcomeGSE47407 <- healthyControlsCheck("GSE47407", TRUE)
#> Processed URL: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE47nnn/GSE47407
#> Found 1 file(s)
#> GSE47407_series_matrix.txt.gz
#> === === === === === GSE47407 === === === === ===
#> :: The keyword "healthy" was NOT found among the annotations of this dataset (GSE47407)
#> :: The keyword "control" was NOT found among the annotations of this dataset (GSE47407)
#> === === === === === === === === === === === ===
#> 
#> healthyControlsCheck() call output: were healthy controls found in the GSE47407 dataset? FALSE

The function will print all the intermediate messages, and eventually the outcomeGSE47407 variable will be true if healthy controls were found, or false otherwise.

0.5 Contacts

This software was developed by Davide Chicco, who can be contacted via email at davidechicco(AT)davidechicco.it

Session Info

sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] healthyControlsPresenceChecker_1.6.0 BiocStyle_2.30.0                    
#> 
#> loaded via a namespace (and not attached):
#>  [1] limma_3.58.0              jsonlite_1.8.7           
#>  [3] dplyr_1.1.3               compiler_4.3.1           
#>  [5] BiocManager_1.30.22       tidyselect_1.2.0         
#>  [7] Biobase_2.62.0            xml2_1.3.5               
#>  [9] tidyr_1.3.0               jquerylib_0.1.4          
#> [11] geneExpressionFromGEO_0.9 statmod_1.5.0            
#> [13] yaml_2.3.7                fastmap_1.1.1            
#> [15] readr_2.1.4               R6_2.5.1                 
#> [17] generics_0.1.3            curl_5.1.0               
#> [19] GEOquery_2.70.0           knitr_1.44               
#> [21] BiocGenerics_0.48.0       tibble_3.2.1             
#> [23] bookdown_0.36             bslib_0.5.1              
#> [25] pillar_1.9.0              tzdb_0.4.0               
#> [27] R.utils_2.12.2            rlang_1.1.1              
#> [29] utf8_1.2.4                cachem_1.0.8             
#> [31] xfun_0.40                 sass_0.4.7               
#> [33] cli_3.6.1                 formatR_1.14             
#> [35] withr_2.5.1               magrittr_2.0.3           
#> [37] digest_0.6.33             hms_1.1.3                
#> [39] lifecycle_1.0.3           R.oo_1.25.0              
#> [41] R.methodsS3_1.8.2         vctrs_0.6.4              
#> [43] evaluate_0.22             glue_1.6.2               
#> [45] data.table_1.14.8         fansi_1.0.5              
#> [47] rmarkdown_2.25            purrr_1.0.2              
#> [49] tools_4.3.1               pkgconfig_2.0.3          
#> [51] htmltools_0.5.6.1