Introduction
Installation
Get data: Get KEGG pathway, network and TCGA data
- getKEGGdata: Searching KEGG data for download
- getNETdata: Searching network data for download
Integration data: Integration between KEGG pathway and network data
- path_net: Network of interacting genes for each pathway according a network type (PHint,COloc,GENint,PATH,SHpd)
- list_path_net: List of interacting genes for each pathway (list of genes) according a network type (PHint,COloc,GENint,PATH,SHpd)
Pathway summary indexes: Score for each pathway
Pathway cross-talk indexes: Score for pairwise pathways
- euc_dist_crtlk: Euclidean distance for cross-talk measure
- ds_score_crtlk: Discriminating score for cross-talk measure
Selection of pathway cross-talk: Selection of pathway cross-talk
- svm_classification: SVM classification
IPPI: Driver genes for each pathway
plotting_cross_talk: Plot pathway cross-talk
References

Introduction

Motivation: New technologies have made possible to identify marker gene signatures. However, gene expression-based signatures present some limitations because they do not consider metabolic role of the genes and are affected by genetic heterogeneity across patient cohorts. Considering the activity of entire pathways rather than the expression levels of individual genes can be a way to exceed these limits. This tool StarBioTrek presents some methodologies to measure pathway activity and cross-talk among pathways integrating also the information of network and TCGA data. New measures are under development.

Installation

To install use the code below.

source("https://bioconductor.org/biocLite.R")
biocLite("StarBioTrek")

`Get data`: Get KEGG pathway, network and TCGA data

`getKEGGdata`: Searching KEGG data for download

You can easily search KEGG data and their genes using the getKEGGdata function. It can download 306 pathways from KEGG or the user can select a particular list of pathways manually curated based on their role in the cell as metabolism, genetic Information processing, environmental information processing, Cellular processes and Organismal Systems using the following parameters:

METABOLISM

Carb_met Carbohydrate_metabolism
Ener_met Energy_metabolism
Lip_met Lipid_metabolism
Amn_met Aminoacid_metabolism
Gly_bio_met Glycan_biosynthesis_and_metabolism
Cof_vit_met Metabolism_of_cofactors_and_vitamins

GENETIC INFORMATION PROCESSING

Transcript Transcription
Transl Translation
Fold_degr Folding_sorting_and_degradation
Repl_repair Replication_and_repair

ENVIRONMENTAL INFORMATION PROCESSING

sign_transd Signal_transduction
sign_mol_int Signaling_molecules_and_interaction

CELLULAR PROCESSES

Transp_cat Transport_and_catabolism
cell_grow_d Cell_growth_and_death
cell_comm Cellular_community

ORGANISMAL SYSTEMS

imm_syst Immune_system
end_syst Endocrine_system
circ_syst Circulatory_system
dig_syst Digestive_system
exc_syst Excretory_system
nerv_syst Nervous_system
sens_syst Sensory_system

The following code is an example to download the pathways involved in Transcription:

patha<-getKEGGdata(KEGG_path="Transcript")

For example the group Transcript contains different pathways:

## Warning in kable_markdown(x = structure(c("Cell cycle - Homo sapiens
## (human)", : The table should have a header (column names)

Cell cycle - Homo sapiens (human)

p53 signaling pathway - Homo sapiens (human)

The function getKEGGdata with KEGG_path=“KEGG_path” will download all KEGG pathaway.

`getNETdata`: Searching network data for download

You can easily search human network data from GeneMania using the getNETdata function. The network category can be filtered using the following parameters:

PHint Physical_interactions
COloc Co-localization
GENint Genetic_interactions
PATH Pathway
SHpd Shared_protein_domains

For default the organism is homo sapiens. The example show the shared protein domain network for Saccharomyces_cerevisiae. For more information see SpidermiR package.

organism="Saccharomyces_cerevisiae"
netwa<-getNETdata(network="SHpd",organism)

## [1] "Downloading: http://genemania.org/data/current/Saccharomyces_cerevisiae/Shared_protein_domains.INTERPRO.txt ... reference n. 1 of 2"
## [1] "Downloading: http://genemania.org/data/current/Saccharomyces_cerevisiae/Shared_protein_domains.PFAM.txt ... reference n. 2 of 2"
## [1] "Preprocessing of the network n. 1 of 2"
## [1] "Preprocessing of the network n. 2 of 2"

`Integration data`: Integration between KEGG pathway and network data

`path_net`: Network of interacting genes for each pathway according a network type (PHint,COloc,GENint,PATH,SHpd)

The function path_net creates a network of interacting genes for each pathway. Interacting genes are genes belonging to the same pathway and the interaction is given from network chosen by the user, according the paramenters of the function getNETdata. The output will be a network of genes belonging to the same pathway.

network_path<-path_net(pathway=path,data=netw)

## [1] "Cell cycle - Homo sapiens (human)"
## [1] "p53 signaling pathway - Homo sapiens (human)"

`list_path_net`: List of interacting genes for each pathway (list of genes) according a network type (PHint,COloc,GENint,PATH,SHpd)

The function list_path_net creates a list of interacting genes for each pathway. Interacting genes are genes belonging to the same pathway and the interaction is given from network chosen by the user, according the paramenters of the function getNETdata. The output will be a list of genes belonging to the same pathway and those having an interaction in the network.

list_path<-list_path_net(lista_net=network_path,pathway=path)

## [1] "List of genes interacting in the same pathway: Cell cycle - Homo sapiens (human)"
## [1] "List of genes interacting in the same pathway: p53 signaling pathway - Homo sapiens (human)"

`Pathway summary indexes`: Score for each pathway

`GE_matrix`: Get human KEGG pathway data and a gene expression matrix in order to obtain a matrix with the gene expression for only genes given containing in the pathways given in input by the user.

Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function GE_matrix creates a of gene expression levels for each pathway given by the user:

list_path_gene<-GE_matrix(DataMatrix=tumo[,1:2],pathway=path)

## [1] "Cell cycle - Homo sapiens (human)"
## [1] "p53 signaling pathway - Homo sapiens (human)"

`average`: Average of genes for each pathway starting from a matrix of gene expression

Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function average creates an average matrix of gene expression for each pathway:

score_mean<-average(dataFilt=tumo[,1:2],pathway=path)

`st_dv`: Standard deviations of genes for each pathway starting from a matrix of gene expression

Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function st_dv creates a standard deviation matrix of gene expression for each pathway:

score_st_dev<-st_dv(DataMatrix=tumo[,1:2],pathway=path)

## [1] "Cell cycle - Homo sapiens (human)"
## [1] "p53 signaling pathway - Homo sapiens (human)"

`Pathway cross-talk indexes`: Score for pairwise pathways

`euc_dist_crtlk`: Euclidean distance for cross-talk measure

Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function euc_dist_crtlk creates an euclidean distance matrix of gene expression for pairwise pathway.

score_euc_dista<-euc_dist_crtlk(dataFilt=tumo[,1:2],pathway=path)

`ds_score_crtlk`: Discriminating score for cross-talk measure

Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function ds_score_crtlk creates an discriminating score matrix for pairwise pathway as measure of cross-talk. Discriminating score is given by |M1-M2|/S1+S2 where M1 and M2 are mean and S1 and S2 standard deviation of expression levels of genes in a pathway 1 and and in a pathway 2 .

cross_talk_st_dv<-ds_score_crtlk(dataFilt=tumo[,1:2],pathway=path)

## [1] "Cell cycle - Homo sapiens (human)"
## [1] "p53 signaling pathway - Homo sapiens (human)"

`Selection of pathway cross-talk`: Selection of pathway cross-talk

`svm_classification`: SVM classification

Given the substantial difference in the activities of many pathways between two classes (e.g. normal and cancer), we examined the effectiveness to classify the classes based on their pairwise pathway profiles. This function is used to find the interacting pathways that are altered in a particular pathology in terms of Area Under Curve (AUC).AUC was estimated by cross-validation method (k-fold cross-validation, k=10).It randomly selected some fraction of TCGA data (e.g. nf= 60; 60% of original dataset) to form the training set and then assigned the rest of the points to the testing set (40% of original dataset). For each pairwise pathway the user can obtain using the methods mentioned above a score matrix ( e.g.dev_std_crtlk ) and can focus on the pairs of pathways able to differentiate a particular subtype with respect to the normal type.

nf <- 60
res_class<-svm_classification(TCGA_matrix=score_euc_dist[1:2,],nfs=nf,normal=colnames(norm[,1:12]),tumour=colnames(tumo[,1:12]))

`IPPI`: Driver genes for each pathway

The function IPPI, using pathways and networks data, calculates the driver genes for each pathway. Please see Cava et al. BMC Genomics 2017.

DRIVER_SP<-IPPI(patha=path,netwa=netw)

## [1] "Cell cycle - Homo sapiens (human)"
## [1] "p53 signaling pathway - Homo sapiens (human)"
## [1] "List of genes interacting in the same pathway: Cell cycle - Homo sapiens (human)"
## [1] "List of genes interacting in the same pathway: p53 signaling pathway - Homo sapiens (human)"
## [1] "1 INTERACTION Cell cycle - Homo sapiens (human)"
## [1] "2 INTERACTION p53 signaling pathway - Homo sapiens (human)"

`plotting_cross_talk`: Plot pathway cross-talk

The function plotting_cross_talk creates the input for the visualization using qgraph package. This function uses the relationship among genes as correlation (green for positive correlations and red for negative correlations) and the width of the lines indicates the strength of the correlation.

The result from plotting_cross_talk is shown below:

Session Information

sessionInfo()

## R version 3.4.2 (2017-09-28)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.3 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.6-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.6-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
##  [1] grid      parallel  stats4    stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
## [1] png_0.1-7            StarBioTrek_1.4.0    miRNAtap_1.12.0     
## [4] AnnotationDbi_1.40.0 IRanges_2.12.0       S4Vectors_0.16.0    
## [7] Biobase_2.38.0       BiocGenerics_0.24.0  BiocStyle_2.6.0     
## 
## loaded via a namespace (and not attached):
##   [1] backports_1.1.1             circlize_0.4.1             
##   [3] aroma.light_3.8.0           igraph_1.1.2               
##   [5] plyr_1.8.4                  selectr_0.3-1              
##   [7] ConsensusClusterPlus_1.42.0 lazyeval_0.2.1             
##   [9] splines_3.4.2               BiocParallel_1.12.0        
##  [11] GenomeInfoDb_1.14.0         ggplot2_2.2.1              
##  [13] digest_0.6.12               foreach_1.4.3              
##  [15] htmltools_0.3.6             gdata_2.18.0               
##  [17] magrittr_1.5                memoise_1.1.0              
##  [19] cluster_2.0.6               doParallel_1.0.11          
##  [21] ROCR_1.0-7                  limma_3.34.0               
##  [23] ComplexHeatmap_1.16.0       Biostrings_2.46.0          
##  [25] readr_1.1.1                 annotate_1.56.0            
##  [27] matrixStats_0.52.2          R.utils_2.5.0              
##  [29] prettyunits_1.0.2           colorspace_1.3-2           
##  [31] blob_1.1.0                  rvest_0.3.2                
##  [33] ggrepel_0.7.0               dplyr_0.7.4                
##  [35] RCurl_1.95-4.8              jsonlite_1.5               
##  [37] roxygen2_6.0.1              genefilter_1.60.0          
##  [39] bindr_0.1                   zoo_1.8-0                  
##  [41] survival_2.41-3             miRNAtap.db_0.99.10        
##  [43] iterators_1.0.8             glue_1.2.0                 
##  [45] survminer_0.4.0             gtable_0.2.0               
##  [47] zlibbioc_1.24.0             XVector_0.18.0             
##  [49] GetoptLong_0.1.6            DelayedArray_0.4.0         
##  [51] shape_1.4.3                 scales_0.5.0               
##  [53] DESeq_1.30.0                DBI_0.7                    
##  [55] edgeR_3.20.0                ggthemes_3.4.0             
##  [57] Rcpp_0.12.13                cmprsk_2.2-7               
##  [59] xtable_1.8-2                progress_1.1.2             
##  [61] foreign_0.8-69              bit_1.1-12                 
##  [63] matlab_1.0.2                km.ci_0.5-2                
##  [65] sqldf_0.4-11                htmlwidgets_0.9            
##  [67] httr_1.3.1                  gplots_3.0.1               
##  [69] RColorBrewer_1.1-2          pkgconfig_2.0.1            
##  [71] XML_3.98-1.9                R.methodsS3_1.7.1          
##  [73] locfit_1.5-9.1              rlang_0.1.2                
##  [75] reshape2_1.4.2              visNetwork_2.0.1           
##  [77] munsell_0.4.3               tools_3.4.2                
##  [79] downloader_0.4              gsubfn_0.6-6               
##  [81] RSQLite_2.0                 devtools_1.13.3            
##  [83] broom_0.4.2                 evaluate_0.10.1            
##  [85] stringr_1.2.0               yaml_2.1.14                
##  [87] org.Hs.eg.db_3.4.2          knitr_1.17                 
##  [89] bit64_0.9-7                 caTools_1.17.1             
##  [91] survMisc_0.5.4              purrr_0.2.4                
##  [93] KEGGREST_1.18.0             bindrcpp_0.2               
##  [95] TCGAbiolinks_2.6.0          EDASeq_2.12.0              
##  [97] nlme_3.1-131                R.oo_1.21.0                
##  [99] xml2_1.1.1                  biomaRt_2.34.0             
## [101] compiler_3.4.2              curl_3.0                   
## [103] e1071_1.6-8                 tibble_1.3.4               
## [105] geneplotter_1.56.0          stringi_1.1.5              
## [107] highr_0.6                   GenomicFeatures_1.30.0     
## [109] lattice_0.20-35             Matrix_1.2-11              
## [111] commonmark_1.4              psych_1.7.8                
## [113] KMsurv_0.1-5                networkD3_0.4              
## [115] GlobalOptions_0.0.12        data.table_1.10.4-3        
## [117] bitops_1.0-6                rtracklayer_1.38.0         
## [119] GenomicRanges_1.30.0        R6_2.2.2                   
## [121] latticeExtra_0.6-28         hwriter_1.3.2              
## [123] bookdown_0.5                RMySQL_0.10.13             
## [125] ShortRead_1.36.0            KernSmooth_2.23-15         
## [127] gridExtra_2.3               codetools_0.2-15           
## [129] gtools_3.5.0                assertthat_0.2.0           
## [131] chron_2.3-51                SummarizedExperiment_1.8.0 
## [133] proto_1.0.0                 rprojroot_1.2              
## [135] rjson_0.2.15                withr_2.0.0                
## [137] SpidermiR_1.8.0             GenomicAlignments_1.14.0   
## [139] Rsamtools_1.30.0            mnormt_1.5-5               
## [141] GenomeInfoDbData_0.99.1     hms_0.3                    
## [143] class_7.3-14                tidyr_0.7.2                
## [145] rmarkdown_1.6               ggpubr_0.1.5

Working with StarBioTrek package

2017-10-30

Contents

Introduction

Installation

`Get data`: Get KEGG pathway, network and TCGA data

`getKEGGdata`: Searching KEGG data for download

`getNETdata`: Searching network data for download

`Integration data`: Integration between KEGG pathway and network data

`path_net`: Network of interacting genes for each pathway according a network type (PHint,COloc,GENint,PATH,SHpd)

`list_path_net`: List of interacting genes for each pathway (list of genes) according a network type (PHint,COloc,GENint,PATH,SHpd)

`Pathway summary indexes`: Score for each pathway

`GE_matrix`: Get human KEGG pathway data and a gene expression matrix in order to obtain a matrix with the gene expression for only genes given containing in the pathways given in input by the user.

`average`: Average of genes for each pathway starting from a matrix of gene expression

`st_dv`: Standard deviations of genes for each pathway starting from a matrix of gene expression

`Pathway cross-talk indexes`: Score for pairwise pathways

`euc_dist_crtlk`: Euclidean distance for cross-talk measure

`ds_score_crtlk`: Discriminating score for cross-talk measure

`Selection of pathway cross-talk`: Selection of pathway cross-talk

`svm_classification`: SVM classification

`IPPI`: Driver genes for each pathway

`plotting_cross_talk`: Plot pathway cross-talk

Session Information

References

Working with StarBioTrek package

2017-10-30

Contents

Introduction

Installation

Get data: Get KEGG pathway, network and TCGA data

getKEGGdata: Searching KEGG data for download

getNETdata: Searching network data for download

Integration data: Integration between KEGG pathway and network data

path_net: Network of interacting genes for each pathway according a network type (PHint,COloc,GENint,PATH,SHpd)

list_path_net: List of interacting genes for each pathway (list of genes) according a network type (PHint,COloc,GENint,PATH,SHpd)

Pathway summary indexes: Score for each pathway

GE_matrix: Get human KEGG pathway data and a gene expression matrix in order to obtain a matrix with the gene expression for only genes given containing in the pathways given in input by the user.

average: Average of genes for each pathway starting from a matrix of gene expression

st_dv: Standard deviations of genes for each pathway starting from a matrix of gene expression

Pathway cross-talk indexes: Score for pairwise pathways

euc_dist_crtlk: Euclidean distance for cross-talk measure

ds_score_crtlk: Discriminating score for cross-talk measure

Selection of pathway cross-talk: Selection of pathway cross-talk

svm_classification: SVM classification

IPPI: Driver genes for each pathway

plotting_cross_talk: Plot pathway cross-talk

Session Information

References

`Get data`: Get KEGG pathway, network and TCGA data

`getKEGGdata`: Searching KEGG data for download

`getNETdata`: Searching network data for download

`Integration data`: Integration between KEGG pathway and network data

`path_net`: Network of interacting genes for each pathway according a network type (PHint,COloc,GENint,PATH,SHpd)

`list_path_net`: List of interacting genes for each pathway (list of genes) according a network type (PHint,COloc,GENint,PATH,SHpd)

`Pathway summary indexes`: Score for each pathway

`GE_matrix`: Get human KEGG pathway data and a gene expression matrix in order to obtain a matrix with the gene expression for only genes given containing in the pathways given in input by the user.

`average`: Average of genes for each pathway starting from a matrix of gene expression

`st_dv`: Standard deviations of genes for each pathway starting from a matrix of gene expression

`Pathway cross-talk indexes`: Score for pairwise pathways

`euc_dist_crtlk`: Euclidean distance for cross-talk measure

`ds_score_crtlk`: Discriminating score for cross-talk measure

`Selection of pathway cross-talk`: Selection of pathway cross-talk

`svm_classification`: SVM classification

`IPPI`: Driver genes for each pathway

`plotting_cross_talk`: Plot pathway cross-talk