Supporting data for geneplast evolutionary analyses

Leonardo RS Campos, Danilo O Imparato, Mauro AA Castro, Rodrigo JS Dalmolin

2020-10-27

Overview

Geneplast is designed for evolutionary and plasticity analyses based on orthologous groups (OG) distribution in a given species tree. This supporting package provides datasets obtained and processed from different orthologous databases for use in geneplast evolutionary analyses.

Currently, data from the following sources are available:

Each dataset consists of four objects:

Objects creation

The general procedure for creating the objects previously described starts by selecting only eukaryotes species from the orthologous database with the aid of NCBI taxonomy classification.

We build a graph from taxonomy nodes and locate the root of eukaryotes. Then, we traverse this sub-graph from root to leaves corresponding to the taxonomy identifiers of the species in the database. By selecting the leaves of the resulting sub-graph, we obtain the sspids object.

Once the species of interest are selected, the orthology information of corresponding proteins are filtered to obtain the cogdata object. The cogids object consists of unique orthologs identifiers from cogdata.

Finally, the phyloTree object is built from TimeTree full eukaryotes phylogenetic tree, which is pruned to show only our species of interest. The missing species are filled using strategies of matching genera and closest species inferred from NCBI’s tree previously built.

Installation

If you don’t already have AnnotationHub installed on your system, use BiocManager::install to install the package:

install.packages("BiocManager")
BiocManager::install("AnnotationHub")

Getting started

To begin, let’s create a new AnnotationHub connection and use it to query AnnotationHub for all Geneplast resources.

library('AnnotationHub')
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:parallel':
#> 
#>     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
#>     clusterExport, clusterMap, parApply, parCapply, parLapply,
#>     parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#>     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#>     pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
#>     tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: BiocFileCache
#> Loading required package: dbplyr

# create an AnnotationHub connection
ah <- AnnotationHub()
#> snapshotDate(): 2020-10-26

# search for all Geneplast resources
meta <- query(ah, "geneplast")

length(meta)
#> [1] 3
head(meta)
#> AnnotationHub with 3 records
#> # snapshotDate(): 2020-10-26
#> # $dataprovider: STRING, OrthoDB, OMA
#> # $species: NA
#> # $rdataclass: Rda
#> # additional mcols(): taxonomyid, genome, description,
#> #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#> #   rdatapath, sourceurl, sourcetype 
#> # retrieve records with, e.g., 'object[["AH83116"]]' 
#> 
#>             title                  
#>   AH83116 | STRING database v11.0  
#>   AH83117 | OMA Browser All.Jan2020
#>   AH83118 | OrthoDB v10.1

# types of Geneplast data available
table(meta$rdataclass)
#> 
#> Rda 
#>   3

# distribution of resources by specific databases
table(meta$dataprovider)
#> 
#>     OMA OrthoDB  STRING 
#>       1       1       1

Please refer to geneplast vignette for more details.

Session Information

sessionInfo()
#> R version 4.0.2 Patched (2020-09-15 r79213)
#> Platform: x86_64-apple-darwin17.7.0 (64-bit)
#> Running under: macOS High Sierra 10.13.6
#> 
#> Matrix products: default
#> BLAS:   /Users/ka36530_ca/R-stuff/bin/R-4-0/lib/libRblas.dylib
#> LAPACK: /Users/ka36530_ca/R-stuff/bin/R-4-0/lib/libRlapack.dylib
#> 
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] parallel  stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#> [1] AnnotationHub_2.21.5 BiocFileCache_1.13.1 dbplyr_1.4.4        
#> [4] BiocGenerics_0.35.4 
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.5                    later_1.1.0.1                
#>  [3] BiocManager_1.30.10           pillar_1.4.6                 
#>  [5] compiler_4.0.2                tools_4.0.2                  
#>  [7] digest_0.6.26                 bit_4.0.4                    
#>  [9] evaluate_0.14                 RSQLite_2.2.1                
#> [11] memoise_1.1.0                 lifecycle_0.2.0              
#> [13] tibble_3.0.4                  pkgconfig_2.0.3              
#> [15] rlang_0.4.8                   shiny_1.5.0                  
#> [17] DBI_1.1.0                     curl_4.3                     
#> [19] yaml_2.2.1                    xfun_0.18                    
#> [21] fastmap_1.0.1                 dplyr_1.0.2                  
#> [23] stringr_1.4.0                 httr_1.4.2                   
#> [25] knitr_1.30                    IRanges_2.23.10              
#> [27] generics_0.0.2                vctrs_0.3.4                  
#> [29] S4Vectors_0.27.14             rappdirs_0.3.1               
#> [31] stats4_4.0.2                  bit64_4.0.5                  
#> [33] tidyselect_1.1.0              Biobase_2.49.1               
#> [35] glue_1.4.2                    R6_2.4.1                     
#> [37] AnnotationDbi_1.51.3          rmarkdown_2.4                
#> [39] purrr_0.3.4                   blob_1.2.1                   
#> [41] magrittr_1.5                  promises_1.1.1               
#> [43] ellipsis_0.3.1                htmltools_0.5.0              
#> [45] assertthat_0.2.1              xtable_1.8-4                 
#> [47] mime_0.9                      interactiveDisplayBase_1.27.5
#> [49] httpuv_1.5.4                  stringi_1.5.3                
#> [51] BiocVersion_3.12.0            crayon_1.3.4