xina_user_code.utf8.md

title: “XINA: a workflow for the integration of multiplexed proteomics kinetics data with network analysis”

output: rmarkdown::html_vignette

vignette: >

1. Introduction

Quantitative proteomics experiments, using for instance isobaric tandem mass tagging approaches, are conducive to measuring changes in protein abundance over multiple time points in response to one or more conditions or stimulations. The aim of XINA is to determine which proteins exhibit similar patterns within and across experimental conditions, since proteins with co-abundance patterns may have common molecular functions. XINA imports multiple datasets, tags dataset in silico, and combines the data for subsequent subgrouping into multiple clusters. The result is a single output depicting the variation across all conditions. XINA, not only extracts co-abundance profiles within and across experiments, but also incorporates protein-protein interaction databases and integrative resources such as KEGG to infer interactors and molecular functions, respectively, and produces intuitive graphical outputs.

Main contribution: an easy-to-use software for non-expert users of clustering and network analyses.

Data inputs: any type of quantitative proteomics data, label or label-free

2. XINA references

https://cics.bwh.harvard.edu/software

https://github.com/langholee/XINA/

3. XINA installation

XINA requires R>=3.5.0.

# Install from Github
install.packages('devtools')
library('devtools')
install_github('langholee/XINA')

# Install from Bioconductor
install.packages("BiocManager")
BiocManager::install("XINA")

To follow this tutorial, you need these libraries. If you don’t have the packages below, please install them.

library(XINA)

## Loading required package: Biobase

## Loading required package: BiocGenerics

## Loading required package: parallel

## 
## Attaching package: 'BiocGenerics'

## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB

## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs

## The following objects are masked from 'package:base':
## 
##     Filter, Find, Map, Position, Reduce, anyDuplicated, append,
##     as.data.frame, basename, cbind, colMeans, colSums, colnames,
##     dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
##     intersect, is.unsorted, lapply, lengths, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
##     rowMeans, rowSums, rownames, sapply, setdiff, sort, table,
##     tapply, union, unique, unsplit, which, which.max, which.min

## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.

library(igraph)

## 
## Attaching package: 'igraph'

## The following objects are masked from 'package:BiocGenerics':
## 
##     normalize, path, union

## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum

## The following object is masked from 'package:base':
## 
##     union

library(ggplot2)
library(STRINGdb)

4. Example theoretical dataset

We generated an example dataset to show how XINA can be used for your research. To demonstrate XINA functions and allow users to perform similar exercises, we included a module that can generate multiplexed time-series datasets using theoretical data. This data consists of three treatment conditions, ‘Control’, ‘Stimulus1’ and ‘Stimulus2’. Each condition has time series data from 0 hour to 72 hours. As an example, we chose the mTOR pathway to be differentially regulated across the three conditions.

# Generate random multiplexed time-series data
random_data_info <- make_random_xina_data()

# The number of proteins
random_data_info$size

## [1] 500

# Time points
random_data_info$time_points

## [1] "0hr"  "2hr"  "6hr"  "12hr" "24hr" "48hr" "72hr"

# Three conditions
random_data_info$conditions

## [1] "Control"   "Stimulus1" "Stimulus2"

Read and check the randomly generated data

Control <- read.csv("Control.csv", check.names=FALSE, stringsAsFactors = FALSE)
Stimulus1 <- read.csv("Stimulus1.csv", check.names=FALSE, stringsAsFactors = FALSE)
Stimulus2 <- read.csv("Stimulus2.csv", check.names=FALSE, stringsAsFactors = FALSE)

head(Control)

##   Accession                                   Description 0hr    2hr
## 1         T              T brachyury transcription factor 0.5 0.5133
## 2    DAZAP1                      DAZ associated protein 1 0.5 0.7006
## 3     DDAH2     dimethylarginine dimethylaminohydrolase 2 0.5 0.5977
## 4    CSRNP2    cysteine and serine rich nuclear protein 2 0.5 0.2697
## 5    PPP3CA protein phosphatase 3 catalytic subunit alpha 0.5 0.3538
## 6      HIRA                  histone cell cycle regulator 0.5 0.5454
##      6hr   12hr   24hr   48hr   72hr
## 1 0.5830 0.9779 0.2118 0.3527 0.9873
## 2 0.6982 0.4159 0.9965 0.1818 0.0809
## 3 0.2466 0.6628 0.9082 0.1603 0.2777
## 4 0.2804 0.7607 0.8502 0.1842 0.7774
## 5 0.6492 0.7694 0.8098 0.7063 0.8215
## 6 0.4726 0.6049 0.6419 0.2310 0.5613

head(Stimulus1)

##   Accession                                   Description 0hr    2hr
## 1         T              T brachyury transcription factor 0.5 0.2825
## 2    DAZAP1                      DAZ associated protein 1 0.5 0.9894
## 3     DDAH2     dimethylarginine dimethylaminohydrolase 2 0.5 0.8697
## 4    CSRNP2    cysteine and serine rich nuclear protein 2 0.5 0.1033
## 5    PPP3CA protein phosphatase 3 catalytic subunit alpha 0.5 0.6983
## 6      HIRA                  histone cell cycle regulator 0.5 0.5276
##      6hr   12hr   24hr   48hr   72hr
## 1 0.6672 0.6729 0.3650 0.2077 0.2983
## 2 0.8755 0.3686 0.4673 0.8940 0.4860
## 3 0.3023 0.8681 0.2907 0.9339 0.7926
## 4 0.1938 0.4383 0.2470 0.2277 0.5710
## 5 0.4304 0.6608 0.2637 0.1203 0.6016
## 6 0.0768 0.3804 0.0274 0.6961 0.2325

head(Stimulus2)

##   Accession                                   Description 0hr    2hr
## 1         T              T brachyury transcription factor 0.5 0.3278
## 2    DAZAP1                      DAZ associated protein 1 0.5 0.6509
## 3     DDAH2     dimethylarginine dimethylaminohydrolase 2 0.5 0.3703
## 4    CSRNP2    cysteine and serine rich nuclear protein 2 0.5 0.7271
## 5    PPP3CA protein phosphatase 3 catalytic subunit alpha 0.5 0.7754
## 6      HIRA                  histone cell cycle regulator 0.5 0.4996
##      6hr   12hr   24hr   48hr   72hr
## 1 0.2789 0.7295 0.7992 0.4592 0.9595
## 2 0.0506 0.7153 0.4961 0.4704 0.2974
## 3 0.9034 0.5672 0.8755 0.6279 0.5318
## 4 0.9651 0.5260 0.0581 0.1424 0.8260
## 5 0.8313 0.1384 0.3668 0.5344 0.1410
## 6 0.7239 0.6541 0.1133 0.0357 0.8858

Since XINA needs to know which columns have the kinetics data matrix, the user should give a vector containing column names of the kinetics data matrix. These column names have to be the same in all input datasets (Control input, Stimulus1 input and Stimulus2 input).

head(Control[random_data_info$time_points])

##   0hr    2hr    6hr   12hr   24hr   48hr   72hr
## 1 0.5 0.5133 0.5830 0.9779 0.2118 0.3527 0.9873
## 2 0.5 0.7006 0.6982 0.4159 0.9965 0.1818 0.0809
## 3 0.5 0.5977 0.2466 0.6628 0.9082 0.1603 0.2777
## 4 0.5 0.2697 0.2804 0.7607 0.8502 0.1842 0.7774
## 5 0.5 0.3538 0.6492 0.7694 0.8098 0.7063 0.8215
## 6 0.5 0.5454 0.4726 0.6049 0.6419 0.2310 0.5613

5. Package features

XINA is an R package and can examine, but is not limited to, time-series omics data from multiple experiment conditions. It has three modules: 1. Model-based clustering analysis, 2. coregulation analysis, and 3. Protein-protein interaction network analysis (we used STRING DB for this practice).

5.1 Clustering analysis using model-based clustering or k-means clustering algorithm

XINA implements model-based clustering to classify features (genes or proteins) depending on their expression profiles. The model-based clustering optimizes the number of clusters at minimum Bayesian information criteria (BIC). Model-based clustering is fulfilled by the ‘mclust’ R package [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5096736/], which was used by our previously developed tool mIMT-visHTS [https://www.ncbi.nlm.nih.gov/pubmed/26232111]. By default, XINA performs sum-normalization for each gene/protein time-series profile [https://www.ncbi.nlm.nih.gov/pubmed/19861354]. This step is done to standardize all datasets. Most importantly, XINA assigns an electronic tag to each dataset’s proteins (similar to TMT) in order to combine the multiple datasets (Super dataset) for subsequent clustering.

XINA uses the ‘mclust’ package for the model-based clustering. ‘mclust’ requires the fixed random seed to get reproducible clustering results.

set.seed(0)

‘nClusters’ is the number of desired clusters. ‘mclust’ will choose the most optimized number of clusters by considering the Bayesian information criteria (BIC). BIC of ‘mclust’ is the negative of normal BIC, thus the higher BIC, the more optimized clustering scheme in ‘mclust’, while lowest BIC is better in statistics.

# Data files
data_files <- paste(random_data_info$conditions, ".csv", sep='')
data_files

## [1] "Control.csv"   "Stimulus1.csv" "Stimulus2.csv"

# time points of the data matrix
data_column <- random_data_info$time_points
data_column

## [1] "0hr"  "2hr"  "6hr"  "12hr" "24hr" "48hr" "72hr"

Run the model-based clustering

# Run the model-based clusteirng
clustering_result <- xina_clustering(data_files, data_column=data_column, nClusters=30)

If you think the clustering cannot be optimized by the automatically selected covariance model (scored highest BIC), you can adjust the clustering results by appointing specific parameterisations of the within-condition covariance matrix.

# Model-based clustering using VVI covariance model
clustering_result_vvi <- xina_clustering(data_files, data_column=data_column, nClusters=30, chosen_model='VVI')

XINA also supports k-means clustering as well as the model-based clustering

clustering_result_km <- xina_clustering(data_files, data_column=data_column, nClusters=30, chosen_model='kmeans')

The clustering results are stored in your working directory. XINA clustering generates the mclust BIC plot containing the BIC of each covariance matrix ifor each number of clusters. ‘xina_clusters.csv’ has a long list of XINA cluster results. ‘xina_clusters_aligned.csv’ has clustering results arranged by gene name. If you want to recall previous clustering results, you can use ‘load_previous_results’

clustering_result_reloaded <- load_previous_results(".")
head(clustering_result_reloaded$aligned)

##   Gene name                                 Description Control Stimulus1
## 1     CDC42                      cell division cycle 42       1         8
## 2     XAGE3                   X antigen family member 3       1        10
## 3     SCN4B sodium voltage-gated channel beta subunit 4       1        13
## 4     LCE1A                  late cornified envelope 1A       1        28
## 5     DDX3X               DEAD-box helicase 3, X-linked       1         4
## 6   ZFYVE26         zinc finger FYVE-type containing 26       1        22
##   Stimulus2
## 1        22
## 2         1
## 3        11
## 4        20
## 5        23
## 6         5

Load previously generated dataset for upcoming XINA analyses.

data(xina_example)

For visualizing clustering results, XINA draws line graphs of the clustering results using ‘plot_clusters’.

plot_clusters(example_clusters, xylab=FALSE)

X axis information is considered as a ordinal variable in XINA’s plot_clusters, but you can change it to be continuous variable. By setting xylab=FALSE, you can remove labels.

plot_clusters(example_clusters, xval=c(0,2,6,12,24,48,72), xylab=FALSE)

# You can change Y axis limits to be ranged from 0 to 0.35
plot_clusters(example_clusters, y_lim=c(0,0.35), xval=c(0,2,6,12,24,48,72), xylab=FALSE)

# If you need, you can modify the clustering plot by creating your own ggplot theme.
theme1 <- theme(title=element_text(size=8, face='bold'),
                axis.text.x = element_text(size=7),
                axis.text.y = element_blank(),
                axis.ticks.x = element_blank(),
                axis.ticks.y = element_blank(),
                axis.title.x = element_blank(),
                axis.title.y = element_blank())
plot_clusters(example_clusters, ggplot_theme=theme1)

XINA calculates sample condition composition, for example the sample composition in the cluster 22 is 93.10% for Stimulus2. ‘plot_condition_composition’ plots these compositions as pie-charts. Sample composition information is insightful because we can find which specific patterns are closely related with each stimulus.

condition_composition <- plot_condition_compositions(example_clusters)

tail(condition_composition)

##    Cluster Condition  N Percent_ratio
## 81      27 Stimulus2 13         21.67
## 82      28   Control  1          2.00
## 83      28 Stimulus2 49         98.00
## 84      29   Control 57         86.36
## 85      29 Stimulus2  9         13.64
## 86      30 Stimulus1 58        100.00

You can modify the condition composition plot with ggplot theme. For example, you can remove the legend.

theme2 <- theme(legend.position="none", title=element_text(size=7, face='bold'))
condition_composition <- plot_condition_compositions(example_clusters, ggplot_theme=theme2)

# Or you can adjust the legend size.
theme3 <- theme(legend.key.size = unit(0.3, "cm"),
                legend.text=element_text(size=5),
                title=element_text(size=7, face='bold'))
condition_composition <- plot_condition_compositions(example_clusters, ggplot_theme=theme3)

# You can utilize your own colors for condition composition charts.
# Make a new color code for conditions
condition_colors <- c("tomato","steelblue1","gold")
names(condition_colors) <- example_clusters$condition
example_clusters$color_for_condition <- condition_colors

# Draw condition composition pie-chart with the new color code
condition_composition <- plot_condition_compositions(example_clusters, ggplot_theme=theme3)

# XINA also can draw bull's-eye plots instead of pie-charts.
condition_composition <- plot_condition_compositions(example_clusters, ggplot_theme=theme3, bullseye = TRUE)

Based on the condition composition pie-charts, we can see some of clusters are mostly coming from specific stimuli or conditions. For example, almost 96.30% proteins in the cluster #19 come from the Stimulus2 condition and around 95.00% of cluster #20 proteins are from Stimulus1. XINA provides a function named as ‘mutate_colors’ to generate colors based on the condition composition. If there are biased condition compositions, such as higher than 50%, ‘mutate_colors’ assigns a color according to ‘color_for_clusters’ parameter of the XINA clustering result. Otherwise, ‘mutate_colors’ will assign null_color, gray by default. By changing ‘color_for_clusters’ parameter of the XINA clustering result, you can recolor the XINA clustering plot.

# New colors for the clustering plot based on condition composition
example_clusters$color_for_clusters <- mutate_colors(condition_composition, example_clusters$color_for_condition)
example_clusters$color_for_clusters

##  [1] "gray"    "gray"    "gray"    "gray"    "gray"    "gray"    "gray"   
##  [8] "gray"    "gray"    "gray"    "gray"    "gray"    "gray"    "gray"   
## [15] "gray"    "gray"    "gray"    "gray"    "gray"    "gray"    "gray"   
## [22] "gray"    "gray"    "gray"    "gray"    "gray"    "gray"    "#8da0cb"
## [29] "#66c2a5" "#fc8d62"

plot_clusters(example_clusters, xval=c(0,2,6,12,24,48,72), xylab=FALSE)

You can lower down the percentage threshold, such as 40%.

example_clusters$color_for_clusters <- mutate_colors(condition_composition, example_clusters$color_for_condition, threshold_percent=40)
plot_clusters(example_clusters, xval=c(0,2,6,12,24,48,72), xylab=FALSE)

5.2 coregulation analysis

XINA supposes that proteins that comigrate between clusters in response to a given condition are more likely to be coregulated at the biological level than other proteins within the same clusters. For this module, at least two datasets to be compared are needed. XINA supposes features assigned to the same cluster in an experiment condition as a coregulated group. XINA traces the comigrated proteins in different experiment conditions and finds signficant trends by 1) the number of member features (proteins) and 2) the enrichment test using the Fishers exact test. The comigrations are displayed via an alluvial plot. In XINA the comigration is defined as a condition of proteins that show the same expression pattern, classified and evaluated by XINA clustering, in at least two dataset conditions. If there are proteins that are assigned to the same cluster in more than two datasets, XINA considers those proteins to be comigrated. XINA’s ‘alluvial_enriched’ is designed to find these comigrations and draws alluvial plots for visualizing the found comigrations.

classes <- as.vector(example_clusters$condition)
classes

## [1] "Control"   "Stimulus1" "Stimulus2"

all_cor <- alluvial_enriched(example_clusters, classes)

## Warning in alluvial_enriched(example_clusters, classes): length(selected_conditions) > 2, so XINA can't apply the enrichment filter
##             Can't apply the enrichment filter, so pval_threshold is ignored

head(all_cor)

##   Control Stimulus1 Stimulus2 Comigration_size RowNum PValue
## 1       0         0         1                5      1     NA
## 2       0         0         2                2      2     NA
## 3       0         0         3                3      3     NA
## 4       0         0         4                7      4     NA
## 5       0         0         5                3      5     NA
## 6       0         0         6                2      6     NA
##   Pvalue.adjusted TP FP FN TN Alluvia_color
## 1              NA NA NA NA NA       #BEBEBE
## 2              NA NA NA NA NA       #BEBEBE
## 3              NA NA NA NA NA       #BEBEBE
## 4              NA NA NA NA NA       #BEBEBE
## 5              NA NA NA NA NA       #BEBEBE
## 6              NA NA NA NA NA       #BEBEBE

By changing order of sample conditions, you can change colors of streams.

all_cor_Stimulus1_start <- alluvial_enriched(example_clusters, c(classes[2],classes[1],classes[3]))
head(all_cor_Stimulus1_start)

You can narrow down comigrations by using the size (the number of comigrated proteins) filter.

cor_bigger_than_10 <- alluvial_enriched(example_clusters, classes, comigration_size=10)
head(cor_bigger_than_10)

From the size flitered comigrations, XINA provides one more filtering using the condition composition pie-charts. It is the limitation to condition-biased clusters. This enables to find coregulations related with the condition-specific patterns. XINA assumes that one expression pattern is majorly found in one speficif experimental condition, that patterns is condition-biased.

condition_biased_comigrations <- get_condition_biased_comigrations(clustering_result=example_clusters, 
                                                                   count_table=cor_bigger_than_10, 
                                                                   selected_conditions=classes, 
                                                                   condition_composition=condition_composition,
                                                                   threshold_percent=50, color_for_null='gray',
                                                                   color_for_highly_matched='yellow', cex = 1)

In the alluvial plot displaying corregulations between three conditions, Control, Stimulus1 and Stimulus2, one large comigration protein condition is evident (tan color). XINA can extract the proteins from this comigrated condition.

condition_biased_clusters <- condition_composition[condition_composition$Percent_ratio>=50,]
control_biased_clusters <- condition_biased_clusters[condition_biased_clusters$Condition=="Control",]$Cluster
Stimulus1_biased_clusters <- condition_biased_clusters[condition_biased_clusters$Condition=="Stimulus1",]$Cluster
Stimulus2_biased_clusters <- condition_biased_clusters[condition_biased_clusters$Condition=="Stimulus2",]$Cluster

# Get the proteins showing condition-specific expression patterns in three conditions
proteins_found_in_condition_biased_clusters <- subset(example_clusters$aligned, Control==control_biased_clusters[1] & Stimulus1==Stimulus1_biased_clusters[1] & Stimulus2==Stimulus2_biased_clusters[1])
nrow(proteins_found_in_condition_biased_clusters)

## [1] 48

head(proteins_found_in_condition_biased_clusters)

##     Gene name                                   Description Control
## 620      AKT1                 AKT serine/threonine kinase 1      29
## 621      BRAF B-Raf proto-oncogene, serine/threonine kinase      29
## 622     CAB39                    calcium binding protein 39      29
## 623      CBX2                                   chromobox 2      29
## 624     DDIT4             DNA damage inducible transcript 4      29
## 625     EIF4B   eukaryotic translation initiation factor 4B      29
##     Stimulus1 Stimulus2
## 620        30        28
## 621        30        28
## 622        30        28
## 623        30        28
## 624        30        28
## 625        30        28

If you compare only two conditions, you can apply Fishers exact test to measure how significantly any given comigration condition is with respect to all in the comparison. The following 2x2 table was used to calculate the p-value from the Fisher’s exact test. To evaluate significance of comigrated proteins from cluster #1 in control to cluster #2 in a test condition,

                       | cluster 1 in control  |    other clusters in control
-----------------------|------------------------|------------------------------
cluster 2 in test     | 65 (TP)                |    175 (FP)
other clusters in test | 35 (FN)                |    1079 (TN)

Control_Stimulus1_significant <- alluvial_enriched(example_clusters, c("Control","Stimulus1"), comigration_size=5, pval_threshold=0.05)

## Warning in data.frame(..., freq = freq, col, alpha, border, hide,
## stringsAsFactors = FALSE): row names were found from a short variable and
## have been discarded

head(Control_Stimulus1_significant)

##     Control Stimulus1 Comigration_size RowNum      PValue Pvalue.adjusted
## 317      29        30               57    317 5.43432e-79    1.717245e-76
##     TP FP FN  TN Alluvia_color
## 317 57  0  1 530       #008B00

5.3 Network analysis

XINA conducts protein-protein interaction (PPI) network analysis through implementing ‘igraph’ and ‘STRINGdb’ R packages. XINA constructs PPI networks for comigrated protein groups as well as individual clusters of a specific experiment (dataset) condition. In the constructed networks, XINA finds influential players by calculating various network centrality calculations including betweenness, closeness and eigenvector scores. For the selected comigrated groups, XINA can calculate an enrichment test based on gene ontology and KEGG pathway terms to help understanding comigrated groups.

XINA’s example dataset is from human gene names, so download human PPI database from STRING DB

string_db <- STRINGdb$new( version="10", species=9606, score_threshold=0, input_directory="" )
string_db

## ***********  STRING - http://string-db.org   ***********
## (Search Tool for the Retrieval of Interacting Genes/Proteins)  
## version: 10
## species: 9606
## ............please wait............
## proteins: 20457
## interactions: 4274001

STRING DB provides PPI confidence score so that users can adjust to get more convincing or more comprehensive PPI networks.

get.edge.attribute(sub_graph)$combined_score

# Medium confidence PPIs only
string_db_med_confidence <- subgraph.edges(string_db$graph, which(E(string_db$graph)$combined_score>=400), delete.vertices = TRUE)

# High confidence PPIs only
string_db_high_confidence <- subgraph.edges(string_db$graph, which(E(string_db$graph)$combined_score>=700), delete.vertices = TRUE)

When you run XINA network analysis, you need XINA clustering results and STRING db object.

xina_result <- xina_analysis(example_clusters, string_db)

## Warning:  we couldn't map to STRING 3% of your identifiers

If you want to use another PPI information instead of STRING DB, you can use a data frame containing PPI information. For example, XINA provides PPI information of HPRD DB datase.

# Construct HPRD PPI network
data(HPRD)
ppi_db_hprd <- simplify(graph_from_data_frame(d=hprd_ppi, directed=FALSE), remove.multiple = FALSE, remove.loops = TRUE)
head(hprd_ppi)

##   gene_symbol_1 gene_symbol_2        Experiment_type
## 1       ALDH1A1       ALDH1A1 in vivo;yeast 2-hybrid
## 2         ITGA7        CHRNA1                in vivo
## 3       PPP1R9A         ACTG1       in vitro;in vivo
## 4          SRGN          CD44                in vivo
## 5          GRB7         ERBB2       in vitro;in vivo
## 6          PAK1         ERBB2                in vivo

# Run XINA with HPRD protein-protein interaction database
xina_result_hprd <- xina_analysis(example_clusters, ppi_db_hprd, is_stringdb=FALSE)

You can draw PPI networks of all the XINA clusters using ‘xina_plots’ function easily. PPI network plots will be stored in the working directory

# XINA network plots labeled gene names
xina_plots(xina_result, example_clusters)

If node labels are excessive and the nodes cannot be visualized, labels can be removed.

# XINA network plots without labels
xina_plots(xina_result, example_clusters, vertex_label_flag=FALSE)

The PPI network layout can be changed. XINA’s available layout options are c(‘sphere’,‘star’,‘gem’,‘tree’,‘circle’,‘random’,‘nicely’). If you need more information about igraph layouts, see http://igraph.org/r/doc/layout_.html

# Plot PPI networks with tree layout
xina_plots(xina_result, example_clusters, vertex_label_flag=FALSE, layout_specified='tree')

If you want to draw protein-protein interaction networks for a cluster, use ‘xina_plot_bycluster’.

xina_plot_bycluster(xina_result, example_clusters, cl=1)

Also, you can print the network only for one condition

xina_plot_bycluster(xina_result, example_clusters, cl=1, condition="Control")

Print the protein-protein interaction entworks for every cluster

img_size <- c(3000,3000)
xina_plot_all(xina_result, example_clusters, img_size=img_size)

## png 
##   2

You can divide protein-protein interaction networks by experimental conditions.

xina_plot_all(xina_result, example_clusters, condition="Control", img_size=img_size)
xina_plot_all(xina_result, example_clusters, condition="Stimulus1", img_size=img_size)
xina_plot_all(xina_result, example_clusters, condition="Stimulus2", img_size=img_size)

XINA can calculate network centrality of the protein-protein interaction networks within experimental conditions.

xina_plot_all(xina_result, example_clusters, centrality_type="Eigenvector", 
              edge.color = 'gray', img_size=img_size)
xina_plot_all(xina_result, example_clusters, condition="Control", centrality_type="Eigenvector", 
              edge.color = 'gray', img_size=img_size)
xina_plot_all(xina_result, example_clusters, condition="Stimulus1", centrality_type="Eigenvector", 
              edge.color = 'gray', img_size=img_size)
xina_plot_all(xina_result, example_clusters, condition="Stimulus2", centrality_type="Eigenvector", 
              edge.color = 'gray', img_size=img_size)

XINA employs STRINGdb package to conduct enrichment tests using KEGG pathway and GO terms.

enriched_fdr_0.05 <- xina_enrichment(xina_result, example_clusters, string_db, pval_threshold=0.05)

Using ’alluvial_enriched’XINA will input the comigrated alluvial plot results and apply the network analysis tools and evaluate them by their network centrality using several network centrality scores, such as degree, eigenvector and hub. XINA ranks proteins by centrality score.

protein_list <- proteins_found_in_condition_biased_clusters$`Gene name`
protein_list

##  [1] "AKT1"     "BRAF"     "CAB39"    "CBX2"     "DDIT4"    "EIF4B"   
##  [7] "EIF4E"    "EIF4E2"   "EIF4EBP1" "IGF1"     "IKBKB"    "IRS1"    
## [13] "MAPK1"    "MAPK3"    "MLST8"    "MTOR"     "PDPK1"    "PIK3CA"  
## [19] "PIK3CD"   "PIK3R2"   "PIK3R3"   "PIK3R5"   "PRKAA1"   "PRKCA"   
## [25] "PRKCB"    "PRKCG"    "PTEN"     "RHEB"     "RICTOR"   "RPS6"    
## [31] "RPS6KA2"  "RPS6KA3"  "RPS6KA6"  "RPS6KB1"  "RPTOR"    "RRAGA"   
## [37] "RRAGB"    "RRAGC"    "RRAGD"    "RYBP"     "STK11"    "STRADA"  
## [43] "TNF"      "TSC1"     "TSC2"     "ULK1"     "ULK2"     "VEGFA"

plot_title_ppi <- "PPI network of the proteins found in the condition-biased clusters"

# Draw PPI networks and compute eigenvector centrality score.
comigrations_condition_biased_clusters <- xina_plot_single(xina_result, protein_list, centrality_type="Eigenvector",
                                                           vertex.label.cex=0.5, vertex.size=8, vertex.label.dist=1,
                                                           main=plot_title_ppi)

You can adjust the number of ranks

# Draw with less breaks
comigrations_condition_biased_clusters <- xina_plot_single(xina_result, protein_list, centrality_type="Eigenvector",
                                                           vertex.label.cex=0.5, vertex.size=8, vertex.label.dist=1,
                                                           num_breaks=3, main=plot_title_ppi)

Without vertex labels, you may see the graph structure better

# Draw without labels
comigrations_condition_biased_clusters <- xina_plot_single(xina_result, protein_list, centrality_type="Eigenvector", 
                                                           vertex.size=10, vertex_label_flag=FALSE)

You can try different graph layouts

# Sphere layout
comigrations_condition_biased_clusters <- xina_plot_single(xina_result, protein_list, centrality_type="Eigenvector",
                                                           vertex.label.cex=0.5, vertex.size=8, vertex.label.dist=1,
                                                           main=plot_title_ppi, layout_specified = "sphere")
# Star layout
comigrations_condition_biased_clusters <- xina_plot_single(xina_result, protein_list, centrality_type="Eigenvector",
                                                           vertex.label.cex=0.5, vertex.size=8, vertex.label.dist=1,
                                                           main=plot_title_ppi, layout_specified = "star")

# Gem layout
comigrations_condition_biased_clusters <- xina_plot_single(xina_result, protein_list, centrality_type="Eigenvector",
                                                           vertex.label.cex=0.5, vertex.size=8, vertex.label.dist=1,
                                                           main=plot_title_ppi, layout_specified = "gem")

# Tree layout
comigrations_condition_biased_clusters <- xina_plot_single(xina_result, protein_list, centrality_type="Eigenvector",
                                                           vertex.label.cex=0.5, vertex.size=8, vertex.label.dist=1,
                                                           main=plot_title_ppi, layout_specified = "tree")

# Circle layout
comigrations_condition_biased_clusters <- xina_plot_single(xina_result, protein_list, centrality_type="Eigenvector",
                                                           vertex.label.cex=0.5, vertex.size=8, vertex.label.dist=1,
                                                           main=plot_title_ppi, layout_specified = "circle")

# Random layout
comigrations_condition_biased_clusters <- xina_plot_single(xina_result, protein_list, centrality_type="Eigenvector",
                                                           vertex.label.cex=0.5, vertex.size=8, vertex.label.dist=1,
                                                           main=plot_title_ppi, layout_specified = "random")

# Nicely layout
comigrations_condition_biased_clusters <- xina_plot_single(xina_result, protein_list, centrality_type="Eigenvector",
                                                           vertex.label.cex=0.5, vertex.size=8, vertex.label.dist=1,
                                                           main=plot_title_ppi, layout_specified = "nicely")

XINA provides functional enrichment tests using KEGG pathways and GO terms via STRINGdb.

# Enrichment test using KEGG pathway terms
kegg_enriched <- xina_enrichment(string_db, protein_list, enrichment_type = "KEGG", pval_threshold=0.1)
head(kegg_enriched$KEGG)

##   term_id proteins hits        pvalue    pvalue_fdr
## 1   04150       59   45 1.716960e-125 2.369405e-123
## 2   04151      336   29  2.671255e-41  1.843166e-39
## 3   04910      132   22  3.775458e-37  1.736710e-35
## 4   04152      119   20  1.033281e-33  3.564820e-32
## 5   04066      103   18  1.387493e-30  3.829479e-29
## 6   05205      218   19  2.882128e-26  6.628896e-25
##             term_description
## 1     mTOR signaling pathway
## 2 PI3K-Akt signaling pathway
## 3  Insulin signaling pathway
## 4     AMPK signaling pathway
## 5    HIF-1 signaling pathway
## 6    Proteoglycans in cancer

# plot enrichment test results
plot_enrichment_results(kegg_enriched$KEGG, num_terms=20)

GO enrichment results using STRING DB

# Enrichment test using GO pathway terms
go_enriched <- xina_enrichment(string_db, protein_list, enrichment_type = "GO", pval_threshold=0.1)
head(go_enriched$Process)
head(go_enriched$Function)
head(go_enriched$Component)

You can draw the GO enrichment results

plot_enrichment_results(go_enriched$Process, num_terms=20)
plot_enrichment_results(go_enriched$Function, num_terms=20)
plot_enrichment_results(go_enriched$Component, num_terms=20)