Get data
: Get KEGG pathway, network and TCGA dataIntegration data
: Integration between KEGG pathway and network dataPathway summary indexes
: Score for each pathwayGE_matrix
: Get human KEGG pathway data and a gene expression matrix in order to obtain a matrix with the gene expression for only genes given containing in the pathways given in input by the user.average
: Average of genes for each pathway starting from a matrix of gene expressionst_dv
: Standard deviations of genes for each pathway starting from a matrix of gene expressionPathway cross-talk indexes
: Score for pairwise pathwaysSelection of pathway cross-talk
: Selection of pathway cross-talkplotting_cross_talk
: Plot pathway cross-talkMotivation: New technologies have made possible to identify marker gene signatures. However, gene expression-based signatures present some limitations because they do not consider metabolic role of the genes and are affected by genetic heterogeneity across patient cohorts. Considering the activity of entire pathways rather than the expression levels of individual genes can be a way to exceed these limits. This tool StarBioTrek
presents some methodologies to measure pathway activity and cross-talk among pathways integrating also the information of network and TCGA data. New measures are under development.
To install use the code below.
source("https://bioconductor.org/biocLite.R")
biocLite("StarBioTrek")
Get data
: Get KEGG pathway, network and TCGA datagetKEGGdata
: Searching KEGG data for downloadYou can easily search KEGG data and their genes using the getKEGGdata
function. It can download 306 pathways from KEGG or the user can select a particular list of pathways manually curated based on their role in the cell as metabolism, genetic Information processing, environmental information processing, Cellular processes and Organismal Systems using the following parameters:
METABOLISM
GENETIC INFORMATION PROCESSING
ENVIRONMENTAL INFORMATION PROCESSING
CELLULAR PROCESSES
ORGANISMAL SYSTEMS
The following code is an example to download the pathways involved in Transcription:
patha<-getKEGGdata(KEGG_path="Transcript")
For example the group Transcript contains different pathways:
Cell cycle - Homo sapiens (human) |
p53 signaling pathway - Homo sapiens (human) |
The function getKEGGdata
with KEGG_path=“KEGG_path” will download all KEGG pathaway.
getNETdata
: Searching network data for downloadYou can easily search human network data from GeneMania using the getNETdata
function. The network category can be filtered using the following parameters:
For default the organism is homo sapiens. The example show the shared protein domain network for Saccharomyces_cerevisiae. For more information see SpidermiR
package.
organism="Saccharomyces_cerevisiae"
netwa<-getNETdata(network="SHpd",organism)
## [1] "Downloading: http://genemania.org/data/current/Saccharomyces_cerevisiae/Shared_protein_domains.INTERPRO.txt ... reference n. 1 of 2"
## [1] "Downloading: http://genemania.org/data/current/Saccharomyces_cerevisiae/Shared_protein_domains.PFAM.txt ... reference n. 2 of 2"
## [1] "Preprocessing of the network n. 1 of 2"
## [1] "Preprocessing of the network n. 2 of 2"
Integration data
: Integration between KEGG pathway and network datapath_net
: Network of interacting genes for each pathway according a network type (PHint,COloc,GENint,PATH,SHpd)The function path_net
creates a network of interacting genes for each pathway. Interacting genes are genes belonging to the same pathway and the interaction is given from network chosen by the user, according the paramenters of the function getNETdata
. The output will be a network of genes belonging to the same pathway.
network_path<-path_net(pathway=path,data=netw)
## [1] "Cell cycle - Homo sapiens (human)"
## [1] "p53 signaling pathway - Homo sapiens (human)"
list_path_net
: List of interacting genes for each pathway (list of genes) according a network type (PHint,COloc,GENint,PATH,SHpd)The function list_path_net
creates a list of interacting genes for each pathway. Interacting genes are genes belonging to the same pathway and the interaction is given from network chosen by the user, according the paramenters of the function getNETdata
. The output will be a list of genes belonging to the same pathway and those having an interaction in the network.
list_path<-list_path_net(lista_net=network_path,pathway=path)
## [1] "List of genes interacting in the same pathway: Cell cycle - Homo sapiens (human)"
## [1] "List of genes interacting in the same pathway: p53 signaling pathway - Homo sapiens (human)"
Pathway summary indexes
: Score for each pathwayGE_matrix
: Get human KEGG pathway data and a gene expression matrix in order to obtain a matrix with the gene expression for only genes given containing in the pathways given in input by the user.Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function GE_matrix
creates a of gene expression levels for each pathway given by the user:
list_path_gene<-GE_matrix(DataMatrix=tumo[,1:2],pathway=path)
## [1] "Cell cycle - Homo sapiens (human)"
## [1] "p53 signaling pathway - Homo sapiens (human)"
average
: Average of genes for each pathway starting from a matrix of gene expressionStarting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function average
creates an average matrix of gene expression for each pathway:
score_mean<-average(dataFilt=tumo[,1:2],pathway=path)
st_dv
: Standard deviations of genes for each pathway starting from a matrix of gene expressionStarting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function st_dv
creates a standard deviation matrix of gene expression for each pathway:
score_st_dev<-st_dv(DataMatrix=tumo[,1:2],pathway=path)
## [1] "Cell cycle - Homo sapiens (human)"
## [1] "p53 signaling pathway - Homo sapiens (human)"
Pathway cross-talk indexes
: Score for pairwise pathwayseuc_dist_crtlk
: Euclidean distance for cross-talk measureStarting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function euc_dist_crtlk
creates an euclidean distance matrix of gene expression for pairwise pathway.
score_euc_dista<-euc_dist_crtlk(dataFilt=tumo[,1:2],pathway=path)
ds_score_crtlk
: Discriminating score for cross-talk measureStarting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function ds_score_crtlk
creates an discriminating score matrix for pairwise pathway as measure of cross-talk. Discriminating score is given by |M1-M2|/S1+S2 where M1 and M2 are mean and S1 and S2 standard deviation of expression levels of genes in a pathway 1 and and in a pathway 2 .
cross_talk_st_dv<-ds_score_crtlk(dataFilt=tumo[,1:2],pathway=path)
## [1] "Cell cycle - Homo sapiens (human)"
## [1] "p53 signaling pathway - Homo sapiens (human)"
Selection of pathway cross-talk
: Selection of pathway cross-talksvm_classification
: SVM classificationGiven the substantial difference in the activities of many pathways between two classes (e.g. normal and cancer), we examined the effectiveness to classify the classes based on their pairwise pathway profiles. This function is used to find the interacting pathways that are altered in a particular pathology in terms of Area Under Curve (AUC).AUC was estimated by cross-validation method (k-fold cross-validation, k=10).It randomly selected some fraction of TCGA data (e.g. nf= 60; 60% of original dataset) to form the training set and then assigned the rest of the points to the testing set (40% of original dataset). For each pairwise pathway the user can obtain using the methods mentioned above a score matrix ( e.g.dev_std_crtlk ) and can focus on the pairs of pathways able to differentiate a particular subtype with respect to the normal type.
nf <- 60
res_class<-svm_classification(TCGA_matrix=score_euc_dist[1:2,],nfs=nf,normal=colnames(norm[,1:12]),tumour=colnames(tumo[,1:12]))
## [1] "Cellcycle_p53signalingpathway"
## [1] "Cellcycle_Apoptosis"
plotting_cross_talk
: Plot pathway cross-talkThe function plotting_cross_talk
creates the input for the visualization using qgraph
package. This function uses the relationship among genes as correlation (green for positive correlations and red for negative correlations) and the width of the lines indicates the strength of the correlation.
The result from plotting_cross_talk
is shown below:
sessionInfo()
## R version 3.3.2 (2016-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.1 LTS
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] grid parallel stats4 stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] png_0.1-7 StarBioTrek_1.0.3 miRNAtap_1.8.0
## [4] AnnotationDbi_1.36.2 IRanges_2.8.1 S4Vectors_0.12.1
## [7] Biobase_2.34.0 BiocGenerics_0.20.0 BiocStyle_2.2.1
##
## loaded via a namespace (and not attached):
## [1] backports_1.0.5 circlize_0.3.9
## [3] fastmatch_1.1-0 aroma.light_3.4.0
## [5] plyr_1.8.4 igraph_1.0.1
## [7] ConsensusClusterPlus_1.38.0 lazyeval_0.2.0
## [9] splines_3.3.2 BiocParallel_1.8.1
## [11] pathview_1.14.0 GenomeInfoDb_1.10.3
## [13] ggplot2_2.2.1 digest_0.6.12
## [15] foreach_1.4.3 BiocInstaller_1.24.0
## [17] htmltools_0.3.5 GOSemSim_2.0.4
## [19] viridis_0.3.4 GO.db_3.4.0
## [21] gdata_2.17.0 magrittr_1.5
## [23] memoise_1.0.0 cluster_2.0.5
## [25] doParallel_1.0.10 ROCR_1.0-7
## [27] limma_3.30.11 ComplexHeatmap_1.12.0
## [29] Biostrings_2.42.1 readr_1.0.0
## [31] annotate_1.52.1 matrixStats_0.51.0
## [33] R.utils_2.5.0 colorspace_1.3-2
## [35] rvest_0.3.2 ggrepel_0.6.5
## [37] dplyr_0.5.0 jsonlite_1.2
## [39] RCurl_1.95-4.8 hexbin_1.27.1
## [41] graph_1.52.0 roxygen2_6.0.1
## [43] genefilter_1.56.0 supraHex_1.12.0
## [45] survival_2.40-1 miRNAtap.db_0.99.10
## [47] iterators_1.0.8 ape_4.0
## [49] survminer_0.2.4 gtable_0.2.0
## [51] zlibbioc_1.20.0 XVector_0.14.0
## [53] GetoptLong_0.1.5 kernlab_0.9-25
## [55] Rgraphviz_2.18.0 shape_1.4.2
## [57] prabclus_2.2-6 DEoptimR_1.0-8
## [59] scales_0.4.1 DOSE_3.0.10
## [61] DESeq_1.26.0 mvtnorm_1.0-5
## [63] edgeR_3.16.5 DBI_0.5-1
## [65] ggthemes_3.3.0 Rcpp_0.12.9
## [67] xtable_1.8-2 matlab_1.0.2
## [69] mclust_5.2.2 preprocessCore_1.36.0
## [71] sqldf_0.4-10 htmlwidgets_0.8
## [73] httr_1.2.1 fgsea_1.0.2
## [75] gplots_3.0.1 RColorBrewer_1.1-2
## [77] fpc_2.1-10 modeltools_0.2-21
## [79] XML_3.98-1.5 R.methodsS3_1.7.1
## [81] flexmix_2.3-13 nnet_7.3-12
## [83] locfit_1.5-9.1 reshape2_1.4.2
## [85] visNetwork_1.0.3 munsell_0.4.3
## [87] tools_3.3.2 downloader_0.4
## [89] gsubfn_0.6-6 RSQLite_1.1-2
## [91] devtools_1.12.0 evaluate_0.10
## [93] stringr_1.1.0 yaml_2.1.14
## [95] org.Hs.eg.db_3.4.0 knitr_1.15.1
## [97] robustbase_0.92-7 caTools_1.17.1
## [99] KEGGREST_1.14.0 dendextend_1.4.0
## [101] TCGAbiolinks_2.2.8 EDASeq_2.8.0
## [103] nlme_3.1-131 c3net_1.1.1
## [105] whisker_0.3-2 R.oo_1.21.0
## [107] KEGGgraph_1.32.0 DO.db_2.9
## [109] xml2_1.1.1 biomaRt_2.30.0
## [111] curl_2.3 e1071_1.6-8
## [113] affyio_1.44.0 minet_3.32.0
## [115] tibble_1.2 geneplotter_1.52.0
## [117] stringi_1.1.2 highr_0.6
## [119] GenomicFeatures_1.26.2 lattice_0.20-34
## [121] trimcluster_0.1-2 Matrix_1.2-8
## [123] commonmark_1.1 networkD3_0.2.13
## [125] GlobalOptions_0.0.10 parmigene_1.0.2
## [127] data.table_1.10.4 bitops_1.0-6
## [129] dnet_1.0.10 rtracklayer_1.34.1
## [131] GenomicRanges_1.26.2 qvalue_2.6.0
## [133] R6_2.2.0 latticeExtra_0.6-28
## [135] affy_1.52.0 hwriter_1.3.2
## [137] ShortRead_1.32.0 KernSmooth_2.23-15
## [139] gridExtra_2.2.1 codetools_0.2-15
## [141] gtools_3.5.0 MASS_7.3-45
## [143] assertthat_0.1 chron_2.3-49
## [145] SummarizedExperiment_1.4.0 proto_1.0.0
## [147] rprojroot_1.2 rjson_0.2.15
## [149] withr_1.0.2 SpidermiR_1.4.5
## [151] GenomicAlignments_1.10.0 Rsamtools_1.26.1
## [153] diptest_0.75-7 clusterProfiler_3.2.11
## [155] tidyr_0.6.1 class_7.3-14
## [157] rmarkdown_1.3