Contents

Introduction

Motivation: New technologies have made possible to identify marker gene signatures. However, gene expression-based signatures present some limitations because they do not consider metabolic role of the genes and are affected by genetic heterogeneity across patient cohorts. Considering the activity of entire pathways rather than the expression levels of individual genes can be a way to exceed these limits. This tool StarBioTrek presents some methodologies to measure pathway activity and cross-talk among pathways integrating also the information of network and TCGA data. New measures are under development.

Installation

To install use the code below.

source("https://bioconductor.org/biocLite.R")
biocLite("StarBioTrek")

Get data: Get KEGG pathway, network and TCGA data

getKEGGdata: Searching KEGG data for download

You can easily search KEGG data and their genes using the getKEGGdata function. It can download 306 pathways from KEGG or the user can select a particular list of pathways manually curated based on their role in the cell as metabolism, genetic Information processing, environmental information processing, Cellular processes and Organismal Systems using the following parameters:

METABOLISM

  • Carb_met Carbohydrate_metabolism
  • Ener_met Energy_metabolism
  • Lip_met Lipid_metabolism
  • Amn_met Aminoacid_metabolism
  • Gly_bio_met Glycan_biosynthesis_and_metabolism
  • Cof_vit_met Metabolism_of_cofactors_and_vitamins

GENETIC INFORMATION PROCESSING

  • Transcript Transcription
  • Transl Translation
  • Fold_degr Folding_sorting_and_degradation
  • Repl_repair Replication_and_repair

ENVIRONMENTAL INFORMATION PROCESSING

  • sign_transd Signal_transduction
  • sign_mol_int Signaling_molecules_and_interaction

CELLULAR PROCESSES

  • Transp_cat Transport_and_catabolism
  • cell_grow_d Cell_growth_and_death
  • cell_comm Cellular_community

ORGANISMAL SYSTEMS

  • imm_syst Immune_system
  • end_syst Endocrine_system
  • circ_syst Circulatory_system
  • dig_syst Digestive_system
  • exc_syst Excretory_system
  • nerv_syst Nervous_system
  • sens_syst Sensory_system

The following code is an example to download the pathways involved in Transcription:

patha<-getKEGGdata(KEGG_path="Transcript")

For example the group Transcript contains different pathways:

## Warning in kable_markdown(x = structure(c("Cell cycle - Homo sapiens
## (human)", : The table should have a header (column names)
Cell cycle - Homo sapiens (human)
p53 signaling pathway - Homo sapiens (human)

The function getKEGGdata with KEGG_path=“KEGG_path” will download all KEGG pathaway.

getNETdata: Searching network data for download

You can easily search human network data from GeneMania using the getNETdata function. The network category can be filtered using the following parameters:

  • PHint Physical_interactions
  • COloc Co-localization
  • GENint Genetic_interactions
  • PATH Pathway
  • SHpd Shared_protein_domains

For default the organism is homo sapiens. The example show the shared protein domain network for Saccharomyces_cerevisiae. For more information see SpidermiR package.

organism="Saccharomyces_cerevisiae"
netwa<-getNETdata(network="SHpd",organism)
## [1] "Downloading: http://genemania.org/data/current/Saccharomyces_cerevisiae/Shared_protein_domains.INTERPRO.txt ... reference n. 1 of 2"
## [1] "Downloading: http://genemania.org/data/current/Saccharomyces_cerevisiae/Shared_protein_domains.PFAM.txt ... reference n. 2 of 2"
## [1] "Preprocessing of the network n. 1 of 2"
## [1] "Preprocessing of the network n. 2 of 2"

Integration data: Integration between KEGG pathway and network data

path_net: Network of interacting genes for each pathway according a network type (PHint,COloc,GENint,PATH,SHpd)

The function path_net creates a network of interacting genes for each pathway. Interacting genes are genes belonging to the same pathway and the interaction is given from network chosen by the user, according the paramenters of the function getNETdata. The output will be a network of genes belonging to the same pathway.

network_path<-path_net(pathway=path,data=netw)
## [1] "Cell cycle - Homo sapiens (human)"
## [1] "p53 signaling pathway - Homo sapiens (human)"

list_path_net: List of interacting genes for each pathway (list of genes) according a network type (PHint,COloc,GENint,PATH,SHpd)

The function list_path_net creates a list of interacting genes for each pathway. Interacting genes are genes belonging to the same pathway and the interaction is given from network chosen by the user, according the paramenters of the function getNETdata. The output will be a list of genes belonging to the same pathway and those having an interaction in the network.

list_path<-list_path_net(lista_net=network_path,pathway=path)
## [1] "List of genes interacting in the same pathway: Cell cycle - Homo sapiens (human)"
## [1] "List of genes interacting in the same pathway: p53 signaling pathway - Homo sapiens (human)"

Pathway summary indexes: Score for each pathway

GE_matrix: Get human KEGG pathway data and a gene expression matrix in order to obtain a matrix with the gene expression for only genes given containing in the pathways given in input by the user.

Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function GE_matrix creates a of gene expression levels for each pathway given by the user:

list_path_gene<-GE_matrix(DataMatrix=tumo[,1:2],pathway=path)
## [1] "Cell cycle - Homo sapiens (human)"
## [1] "p53 signaling pathway - Homo sapiens (human)"

average: Average of genes for each pathway starting from a matrix of gene expression

Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function average creates an average matrix of gene expression for each pathway:

score_mean<-average(dataFilt=tumo[,1:2],pathway=path)

st_dv: Standard deviations of genes for each pathway starting from a matrix of gene expression

Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function st_dv creates a standard deviation matrix of gene expression for each pathway:

score_st_dev<-st_dv(DataMatrix=tumo[,1:2],pathway=path)
## [1] "Cell cycle - Homo sapiens (human)"
## [1] "p53 signaling pathway - Homo sapiens (human)"

Pathway cross-talk indexes: Score for pairwise pathways

euc_dist_crtlk: Euclidean distance for cross-talk measure

Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function euc_dist_crtlk creates an euclidean distance matrix of gene expression for pairwise pathway.

score_euc_dista<-euc_dist_crtlk(dataFilt=tumo[,1:2],pathway=path)

ds_score_crtlk: Discriminating score for cross-talk measure

Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function ds_score_crtlk creates an discriminating score matrix for pairwise pathway as measure of cross-talk. Discriminating score is given by |M1-M2|/S1+S2 where M1 and M2 are mean and S1 and S2 standard deviation of expression levels of genes in a pathway 1 and and in a pathway 2 .

cross_talk_st_dv<-ds_score_crtlk(dataFilt=tumo[,1:2],pathway=path)
## [1] "Cell cycle - Homo sapiens (human)"
## [1] "p53 signaling pathway - Homo sapiens (human)"

Selection of pathway cross-talk: Selection of pathway cross-talk

svm_classification: SVM classification

Given the substantial difference in the activities of many pathways between two classes (e.g. normal and cancer), we examined the effectiveness to classify the classes based on their pairwise pathway profiles. This function is used to find the interacting pathways that are altered in a particular pathology in terms of Area Under Curve (AUC).AUC was estimated by cross-validation method (k-fold cross-validation, k=10).It randomly selected some fraction of TCGA data (e.g. nf= 60; 60% of original dataset) to form the training set and then assigned the rest of the points to the testing set (40% of original dataset). For each pairwise pathway the user can obtain using the methods mentioned above a score matrix ( e.g.dev_std_crtlk ) and can focus on the pairs of pathways able to differentiate a particular subtype with respect to the normal type.

nf <- 60
res_class<-svm_classification(TCGA_matrix=score_euc_dist[1:2,],nfs=nf,normal=colnames(norm[,1:12]),tumour=colnames(tumo[,1:12]))

IPPI: Driver genes for each pathway

The function IPPI, using pathways and networks data, calculates the driver genes for each pathway. Please see Cava et al. BMC Genomics 2017.

DRIVER_SP<-IPPI(patha=path,netwa=netw)
## [1] "Cell cycle - Homo sapiens (human)"
## [1] "p53 signaling pathway - Homo sapiens (human)"
## [1] "List of genes interacting in the same pathway: Cell cycle - Homo sapiens (human)"
## [1] "List of genes interacting in the same pathway: p53 signaling pathway - Homo sapiens (human)"
## [1] "1 INTERACTION Cell cycle - Homo sapiens (human)"
## [1] "2 INTERACTION p53 signaling pathway - Homo sapiens (human)"

plotting_cross_talk: Plot pathway cross-talk

The function plotting_cross_talk creates the input for the visualization using qgraph package. This function uses the relationship among genes as correlation (green for positive correlations and red for negative correlations) and the width of the lines indicates the strength of the correlation.

The result from plotting_cross_talk is shown below:


Session Information


sessionInfo()
## R version 3.4.2 (2017-09-28)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.3 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.6-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.6-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
##  [1] grid      parallel  stats4    stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
## [1] png_0.1-7            StarBioTrek_1.4.0    miRNAtap_1.12.0     
## [4] AnnotationDbi_1.40.0 IRanges_2.12.0       S4Vectors_0.16.0    
## [7] Biobase_2.38.0       BiocGenerics_0.24.0  BiocStyle_2.6.0     
## 
## loaded via a namespace (and not attached):
##   [1] backports_1.1.1             circlize_0.4.1             
##   [3] aroma.light_3.8.0           igraph_1.1.2               
##   [5] plyr_1.8.4                  selectr_0.3-1              
##   [7] ConsensusClusterPlus_1.42.0 lazyeval_0.2.1             
##   [9] splines_3.4.2               BiocParallel_1.12.0        
##  [11] GenomeInfoDb_1.14.0         ggplot2_2.2.1              
##  [13] digest_0.6.12               foreach_1.4.3              
##  [15] htmltools_0.3.6             gdata_2.18.0               
##  [17] magrittr_1.5                memoise_1.1.0              
##  [19] cluster_2.0.6               doParallel_1.0.11          
##  [21] ROCR_1.0-7                  limma_3.34.0               
##  [23] ComplexHeatmap_1.16.0       Biostrings_2.46.0          
##  [25] readr_1.1.1                 annotate_1.56.0            
##  [27] matrixStats_0.52.2          R.utils_2.5.0              
##  [29] prettyunits_1.0.2           colorspace_1.3-2           
##  [31] blob_1.1.0                  rvest_0.3.2                
##  [33] ggrepel_0.7.0               dplyr_0.7.4                
##  [35] RCurl_1.95-4.8              jsonlite_1.5               
##  [37] roxygen2_6.0.1              genefilter_1.60.0          
##  [39] bindr_0.1                   zoo_1.8-0                  
##  [41] survival_2.41-3             miRNAtap.db_0.99.10        
##  [43] iterators_1.0.8             glue_1.2.0                 
##  [45] survminer_0.4.0             gtable_0.2.0               
##  [47] zlibbioc_1.24.0             XVector_0.18.0             
##  [49] GetoptLong_0.1.6            DelayedArray_0.4.0         
##  [51] shape_1.4.3                 scales_0.5.0               
##  [53] DESeq_1.30.0                DBI_0.7                    
##  [55] edgeR_3.20.0                ggthemes_3.4.0             
##  [57] Rcpp_0.12.13                cmprsk_2.2-7               
##  [59] xtable_1.8-2                progress_1.1.2             
##  [61] foreign_0.8-69              bit_1.1-12                 
##  [63] matlab_1.0.2                km.ci_0.5-2                
##  [65] sqldf_0.4-11                htmlwidgets_0.9            
##  [67] httr_1.3.1                  gplots_3.0.1               
##  [69] RColorBrewer_1.1-2          pkgconfig_2.0.1            
##  [71] XML_3.98-1.9                R.methodsS3_1.7.1          
##  [73] locfit_1.5-9.1              rlang_0.1.2                
##  [75] reshape2_1.4.2              visNetwork_2.0.1           
##  [77] munsell_0.4.3               tools_3.4.2                
##  [79] downloader_0.4              gsubfn_0.6-6               
##  [81] RSQLite_2.0                 devtools_1.13.3            
##  [83] broom_0.4.2                 evaluate_0.10.1            
##  [85] stringr_1.2.0               yaml_2.1.14                
##  [87] org.Hs.eg.db_3.4.2          knitr_1.17                 
##  [89] bit64_0.9-7                 caTools_1.17.1             
##  [91] survMisc_0.5.4              purrr_0.2.4                
##  [93] KEGGREST_1.18.0             bindrcpp_0.2               
##  [95] TCGAbiolinks_2.6.0          EDASeq_2.12.0              
##  [97] nlme_3.1-131                R.oo_1.21.0                
##  [99] xml2_1.1.1                  biomaRt_2.34.0             
## [101] compiler_3.4.2              curl_3.0                   
## [103] e1071_1.6-8                 tibble_1.3.4               
## [105] geneplotter_1.56.0          stringi_1.1.5              
## [107] highr_0.6                   GenomicFeatures_1.30.0     
## [109] lattice_0.20-35             Matrix_1.2-11              
## [111] commonmark_1.4              psych_1.7.8                
## [113] KMsurv_0.1-5                networkD3_0.4              
## [115] GlobalOptions_0.0.12        data.table_1.10.4-3        
## [117] bitops_1.0-6                rtracklayer_1.38.0         
## [119] GenomicRanges_1.30.0        R6_2.2.2                   
## [121] latticeExtra_0.6-28         hwriter_1.3.2              
## [123] bookdown_0.5                RMySQL_0.10.13             
## [125] ShortRead_1.36.0            KernSmooth_2.23-15         
## [127] gridExtra_2.3               codetools_0.2-15           
## [129] gtools_3.5.0                assertthat_0.2.0           
## [131] chron_2.3-51                SummarizedExperiment_1.8.0 
## [133] proto_1.0.0                 rprojroot_1.2              
## [135] rjson_0.2.15                withr_2.0.0                
## [137] SpidermiR_1.8.0             GenomicAlignments_1.14.0   
## [139] Rsamtools_1.30.0            mnormt_1.5-5               
## [141] GenomeInfoDbData_0.99.1     hms_0.3                    
## [143] class_7.3-14                tidyr_0.7.2                
## [145] rmarkdown_1.6               ggpubr_0.1.5

References