Contents

1 Overview

Geneplast is designed for large-scale evolutionary plasticity and rooting analysis based on orthologs groups (OG) distribution in a given species tree. This supporting package provides datasets obtained and processed from different orthologs databases for use in geneplast evolutionary analyses.

Currently, data from the following sources are available:

Each dataset consists of four objects:

1.1 Objects creation

The general procedure for creating the objects previously described starts by selecting only eukaryotes species from the orthologs database with the aid of NCBI taxonomy classification.

We build a graph from taxonomy nodes and locate the root of eukaryotes. Then, we traverse this sub-graph from root to leaves corresponding to the taxonomy identifiers of the species in the database. By selecting the leaves of the resulting sub-graph, we obtain the sspids object.

Once the species of interest are selected, the orthology information of corresponding proteins is filtered to obtain the cogdata object. The cogids object consists of unique orthologs identifiers from cogdata.

Finally, the phyloTree object is built from TimeTree full eukaryotes phylogenetic tree, which is pruned to show only our species of interest. The missing species are filled using strategies of matching genera and closest species inferred from NCBI’s tree previously built.

1.2 Loading a dataset

1 - Create a new AnnotationHub connection and query for all geneplast resources.

library('AnnotationHub')
# create an AnnotationHub connection
ah <- AnnotationHub()

# search for all geneplast resources
meta <- query(ah, "geneplast")

head(meta)

2 - Load the objects into the session using the ID of the chosen dataset.

# load the objects from STRING database v11.0
load(meta[["AH83116"]])

2 Case study: Transfer rooting information to a PPI network

This section reproduces a case study using annotated datasets from STRING, OMA, and OrthoDB.

The following steps show how to run geneplast rooting analysis and transfer its results to a graph model. For detailed step-by-step instructions, please check the geneplast vignette.

2.1 STRING

2.1.1 Rooting inference

1 - Create an object of class ‘OGR’ for a reference ‘spid’.

library(geneplast)
ogr <- groot.preprocess(cogdata=cogdata, phyloTree=phyloTree, spid="9606")

2 - Run the groot function and infer the evolutionary roots. Note: this step should take a long processing time due to the large number of OGs in the input data (also, nPermutations argument is set to 100 for demonstration purpose only).

ogr <- groot(ogr, nPermutations=100, verbose=TRUE)

2.1.2 Graph model for a PPI network

1 - Load a PPI network and required packages. The igraph object called ‘ppi.gs’ provides PPI information for apoptosis and genome-stability genes [@Castro2008].

library(RedeR)
library(igraph)
library(RColorBrewer)
data(ppi.gs)

2 - Map rooting information on the igraph object.

g <- ogr2igraph(ogr, cogdata, ppi.gs, idkey = "ENTREZ")

3 - Adjust colors for rooting information.

pal <- brewer.pal(9, "RdYlBu")
color_col <- colorRampPalette(pal)(37) #set a color for each root!
g <- att.setv(g=g, from="Root", to="nodeColor", cols=color_col, na.col = "grey80", breaks = seq(1,37))

4 - Aesthetic adjusts for some graph attributes.

g <- att.setv(g = g, from = "SYMBOL", to = "nodeAlias")
E(g)$edgeColor <- "grey80"
V(g)$nodeLineColor <- "grey80"

5 - Send the igraph object to RedeR interface.

rdp <- RedPort()
calld(rdp)
resetd(rdp)
addGraph(rdp, g)
addLegend.color(rdp, colvec=g$legNodeColor$scale, size=15, labvec=g$legNodeColor$legend, title="Roots represented in Fig1")

6 - Get apoptosis and genome-stability sub-networks.

g1  <- induced_subgraph(g=g, V(g)$name[V(g)$Apoptosis==1])
g2  <- induced_subgraph(g=g, V(g)$name[V(g)$GenomeStability==1])

7 - Group apoptosis and genome-stability genes into containers.

myTheme <- list(nestFontSize=25, zoom=80, isNest=TRUE, gscale=65, theme=2)
addGraph(rdp, g1, gcoord=c(25, 50), theme = c(myTheme, nestAlias="Apoptosis"))
addGraph(rdp, g2, gcoord=c(75, 50), theme = c(myTheme, nestAlias="Genome Stability"))
relax(rdp, p1=50, p2=50, p3=50, p4=50, p5= 50, ps = TRUE)
title

title

2.2 OMA

load(meta[["AH83117"]])
cogdata$cog_id <- paste0("OMA", cogdata$cog_id)
cogids$cog_id <- paste0("OMA", cogids$cog_id)

human_entrez_2_oma_Aug2020 <- read_delim("processed_human.entrez_2_OMA.Aug2020.tsv", 
    delim = "\t", escape_double = FALSE, 
    col_names = FALSE, trim_ws = TRUE)
names(human_entrez_2_oma_Aug2020) <- c("protein_id", "gene_id")
cogdata <- cogdata %>% left_join(human_entrez_2_oma_Aug2020)
ogr <- groot.preprocess(cogdata=cogdata, phyloTree=phyloTree, spid="9606")
ogr <- groot(ogr, nPermutations=100, verbose=TRUE)

g <- ogr2igraph(ogr, cogdata, ppi.gs, idkey = "ENTREZ")
pal <- brewer.pal(9, "RdYlBu")
color_col <- colorRampPalette(pal)(37) #set a color for each root!
g <- att.setv(g=g, from="Root", to="nodeColor", cols=color_col, na.col = "grey80", breaks = seq(1,37))
g <- att.setv(g = g, from = "SYMBOL", to = "nodeAlias")
E(g)$edgeColor <- "grey80"
V(g)$nodeLineColor <- "grey80"
# rdp <- RedPort()
# calld(rdp)
resetd(rdp)
addGraph(rdp, g)
addLegend.color(rdp, colvec=g$legNodeColor$scale, size=15, labvec=g$legNodeColor$legend, title="Roots represented in Fig2")
g1  <- induced_subgraph(g=g, V(g)$name[V(g)$Apoptosis==1])
g2  <- induced_subgraph(g=g, V(g)$name[V(g)$GenomeStability==1])
myTheme <- list(nestFontSize=25, zoom=80, isNest=TRUE, gscale=65, theme=2)
addGraph(rdp, g1, gcoord=c(25, 50), theme = c(myTheme, nestAlias="Apoptosis"))
addGraph(rdp, g2, gcoord=c(75, 50), theme = c(myTheme, nestAlias="Genome Stability"))
relax(rdp, p1=50, p2=50, p3=50, p4=50, p5= 50, ps = TRUE)