PROGENy pathway signatures

Michael Schubert

2018-04-30

PROGENy pathway signatures

This R package provides the model we inferred in the publication “Perturbation-response genes reveal signaling footprints in cancer gene expression” and a function to obtain pathway scores from a gene expression matrix. It is available on bioRxiv.

Scoring the airway package data for pathway scores

This is to outline how to prepare expression data, in this case from the airway package for pathway activity analysis using PROGENy.

Checking for differences between the groups

So now we might be interested how the treatment with dexamethasone affects signaling pathways. To do this, we check if the control is different to the perturbed condition using a linear model:

##      estimate std.error  statistic     p.value  pathway
## 1   3.5244843 6.2973041  0.5596814 0.595956160     EGFR
## 2   0.9685871 1.9389537  0.4995411 0.635184536  Hypoxia
## 3  -1.0136175 0.6475944 -1.5652042 0.168572304 JAK.STAT
## 4   2.2529415 1.4461016  1.5579414 0.170257432     MAPK
## 5   0.3729849 0.7904794  0.4718465 0.653706796     NFkB
## 6  -2.6799274 3.4709147 -0.7721098 0.469359469     PI3K
## 7   1.7282120 1.2942476  1.3353024 0.230202375     TGFb
## 8   0.8435236 0.6813886  1.2379480 0.261976267     TNFa
## 9   1.5958753 0.8993685  1.7744399 0.126345266    Trail
## 10  0.5360574 0.8334720  0.6431619 0.543900991     VEGF
## 11 -4.3376643 0.7560779 -5.7370601 0.001218591      p53

What we see is that indeed the p53/DNA damage response pathway is less active after treatment than before.

Reproducing drug associations on the GDSC panel

Below is an example on how to calculate pathway scores for cell lines in the Genomics of Drug Sensitivity in Cancer (GDSC) panel, and to check for associations with drug response.

The code used for the analyses is available on Github.

Getting the data

This example shows how to use the GDSC gene expression data of multiple cell lines together with PROGENy to calculate pathway activity and then to check for associations with drug sensitivity.

First, we need the GDSC data for both gene expression and drug response. They are available on the GDSC1000 web site:

You can also download the files manually (adjust the file names when loading):

Running PROGENy to get pathway activity scores

Activity inference is done using a weighted sum of the model genes. We can run this without worrying about the order of genes in the expression matrix using:

We now have the pathway activity scores for the pathways defined in PROGENy:

##                EGFR     Hypoxia    JAK.STAT       MAPK       NFkB
## 906826   0.03030286 -0.09136142 -0.36490995 -0.1758001 -0.5793367
## 687983  -0.99125434 -1.32673898 -0.93152060 -0.4946866 -1.3799417
## 910927  -0.10673190 -0.78816420 -1.06002081  0.1370551 -0.5497209
## 1240138 -0.05592591 -0.74266270 -0.07989446 -0.8259452  0.3418629
## 1240139 -0.15157011  0.11136425 -0.58596025 -0.2583581 -0.7256043
## 906792   0.71386069  0.39667896 -0.50001888  1.1967197 -0.4005830
##               PI3K       TGFb       TNFa      Trail        VEGF        p53
## 906826  -0.1999210 -0.6198524 -0.4724567 -0.5891909  0.18688452 -1.1725585
## 687983   0.3824370 -0.6696468 -1.0229424 -0.6113840 -0.06262960 -1.0818725
## 910927  -0.2155790  0.6214328 -0.1737935 -0.9185408  0.24335159  0.8249120
## 1240138  0.5883394  1.8891349  1.0191163  0.1214765 -0.15953605  2.1774919
## 1240139  1.0191110  0.9312615 -0.4347272 -0.2985134  0.36720972  0.8348820
## 906792  -2.1897400 -0.3093659 -0.1523604 -0.1621503  0.08751554 -0.9558531

Testing if MAPK activity is significantly associated with Trametinib

Trametinib is a MEK inhibitor, so we would assume that cell lines that have a higher MAPK activity are more sensitive to MEK inhibition.

We can test this the following way:

## 
## Call:
## lm(formula = trametinib ~ mapk)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.9965 -1.5286  0.3535  1.5446  6.8271 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.03670    0.07155  -14.49   <2e-16 ***
## mapk        -1.31733    0.07095  -18.57   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.125 on 880 degrees of freedom
##   (80 observations deleted due to missingness)
## Multiple R-squared:  0.2815, Adjusted R-squared:  0.2806 
## F-statistic: 344.7 on 1 and 880 DF,  p-value: < 2.2e-16

And indeed we find that MAPK activity is strongly associated with sensitivity to Trametinib: the Pr(>|t|) is much smaller than the conventional threshold of 0.05.

The intercept is significant as well, but we’re not really interested if the mean drug response is above or below 0 in this case.

Note, however, that we tested all cell lines at once and did not adjust for the effect different tissues may have.

R version information

## R version 3.5.0 (2018-04-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.4 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.7-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.7-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] BiocFileCache_1.4.0         dbplyr_1.2.1               
##  [3] bindrcpp_0.2.2              dplyr_0.7.4                
##  [5] progeny_1.2.0               biomaRt_2.36.0             
##  [7] DESeq2_1.20.0               airway_0.113.0             
##  [9] SummarizedExperiment_1.10.0 DelayedArray_0.6.0         
## [11] BiocParallel_1.14.0         matrixStats_0.53.1         
## [13] Biobase_2.40.0              GenomicRanges_1.32.0       
## [15] GenomeInfoDb_1.16.0         IRanges_2.14.0             
## [17] S4Vectors_0.18.0            BiocGenerics_0.26.0        
## [19] knitr_1.20                 
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-137           bitops_1.0-6           bit64_0.9-7           
##  [4] RColorBrewer_1.1-2     progress_1.1.2         httr_1.3.1            
##  [7] rprojroot_1.3-2        tools_3.5.0            backports_1.1.2       
## [10] R6_2.2.2               rpart_4.1-13           Hmisc_4.1-1           
## [13] DBI_0.8                lazyeval_0.2.1         colorspace_1.3-2      
## [16] nnet_7.3-12            gridExtra_2.3          prettyunits_1.0.2     
## [19] mnormt_1.5-5           bit_1.1-12             curl_3.2              
## [22] compiler_3.5.0         htmlTable_1.11.2       scales_0.5.0          
## [25] checkmate_1.8.5        psych_1.8.3.3          readr_1.1.1           
## [28] genefilter_1.62.0      rappdirs_0.3.1         stringr_1.3.0         
## [31] digest_0.6.15          foreign_0.8-70         rmarkdown_1.9         
## [34] XVector_0.20.0         base64enc_0.1-3        pkgconfig_2.0.1       
## [37] htmltools_0.3.6        readxl_1.1.0           htmlwidgets_1.2       
## [40] rlang_0.2.0            rstudioapi_0.7         RSQLite_2.1.0         
## [43] bindr_0.1.1            acepack_1.4.1          RCurl_1.95-4.10       
## [46] magrittr_1.5           GenomeInfoDbData_1.1.0 Formula_1.2-2         
## [49] Matrix_1.2-14          Rcpp_0.12.16           munsell_0.4.3         
## [52] stringi_1.1.7          yaml_2.1.18            zlibbioc_1.26.0       
## [55] plyr_1.8.4             grid_3.5.0             blob_1.1.1            
## [58] lattice_0.20-35        splines_3.5.0          annotate_1.58.0       
## [61] hms_0.4.2              locfit_1.5-9.1         pillar_1.2.2          
## [64] geneplotter_1.58.0     reshape2_1.4.3         XML_3.98-1.11         
## [67] glue_1.2.0             evaluate_0.10.1        latticeExtra_0.6-28   
## [70] data.table_1.10.4-3    cellranger_1.1.0       gtable_0.2.0          
## [73] purrr_0.2.4            tidyr_0.8.0            assertthat_0.2.0      
## [76] ggplot2_2.2.1          xtable_1.8-2           broom_0.4.4           
## [79] survival_2.42-3        tibble_1.4.2           AnnotationDbi_1.42.0  
## [82] memoise_1.1.0          cluster_2.0.7-1