PROGENy pathway signatures: Application to Bulk transcriptomics

0.1 Introduction
0.2 Getting Started
- 0.2.1 Import the raw count dataframe
0.3 Pathway activity with Progeny
0.4 Session Info Details

0.1 Introduction

This is the third part in a series of transcriptomics tutorials. We present here how to estimate pathway activity from transcriptomics data using PROGENy.

Conventional pathway analysis methods rely on the gene expression of the pathway members. However, this approach overlooks the effect of post-translational modifications and only captures very specific experimental conditions. To overcome these limitations, PROGENy (Pathway RespOnsive GENes) estimates the activity of relevant signaling pathways based on consensus gene signatures obtained from perturbation experiments (Schubert et al. 2018)

PROGENy initially contained 11 pathways and was developed for the application to human transcriptomics data. It has been recently shown that PROGENy is also applicable to mouse data and it has been expanded to 14 pathways (Holland et al., 2019). In addition, PROGENy can be applied to scRNA-seq data, as described in (Holland et al., 2020)

PROGENy is available as a Bioconductor package. For additional information about the PROGENy method, visit its website:

https://saezlab.github.io/progeny/

0.2 Getting Started

We first load the required libraries.

library(progeny)
library(tibble)
library(tidyr)
library(dplyr)
library(pheatmap)
library(readr)
library(ggplot2)

0.2.1 Import the raw count dataframe

The data can be downloaded from: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE119931 The file name is: GSE119931_PANC1.FOXA2KO.genes.counts.txt.gz

## We assign the normalised counts and the experimental design 
data("vignette_data")

Normalised_counts <- vignette_data$counts
Experimental_design <- vignette_data$design

## We read the results from the differential analysis. 
ttop_KOvsWT <- vignette_data$limma_ttop

We have to slightly modify the format of the input files to make it suitable for running Progeny.

Normalised_counts_matrix <- Normalised_counts %>% 
    dplyr::mutate_if(~ any(is.na(.x)),~ if_else(is.na(.x),0,.x)) %>% 
    tibble::column_to_rownames(var = "gene") %>% 
    as.matrix()

ttop_KOvsWT_matrix <- ttop_KOvsWT %>% 
    dplyr::select(ID, t) %>% 
    dplyr::filter(!is.na(t)) %>% 
    column_to_rownames(var = "ID") %>%
    as.matrix()

0.3 Pathway activity with Progeny

We first compute Progeny scores for every sample (with the replicates) using the normalised counts. It is worth noting that we are going to use the 100 most responsive genes per pathway. This number can be increased depending on the coverage of your experiments. For instance, the number of quantified genes for single-cell RNA-seq is smaller than for Bulk RNA-seq or microarray. In those cases, we suggest to increase the number of responsive genes to 200-500.

PathwayActivity_counts <- progeny(Normalised_counts_matrix, scale=TRUE, 
    organism="Human", top = 100)
Activity_counts <- as.vector(PathwayActivity_counts)

We present the results in a heatmap:

paletteLength <- 100
myColor <- 
    colorRampPalette(c("darkblue", "whitesmoke","indianred"))(paletteLength)

progenyBreaks <- c(seq(min(Activity_counts), 0, 
    length.out=ceiling(paletteLength/2) + 1),
    seq(max(Activity_counts)/paletteLength, 
    max(Activity_counts), 
    length.out=floor(paletteLength/2)))

progeny_hmap <- pheatmap(t(PathwayActivity_counts),fontsize=14, 
    fontsize_row = 10, fontsize_col = 10, 
    color=myColor, breaks = progenyBreaks, 
    main = "PROGENy (100)", angle_col = 45,
    treeheight_col = 0,  border_color = NA)

Now, we run an enrichment analysis using a competitive permutation approach to assess the significance of the pathway activity. We end up with Normalised Enrichment Scores (NES) for each pathway.

PathwayActivity_zscore <- progeny(ttop_KOvsWT_matrix, 
    scale=TRUE, organism="Human", top = 100, perm = 1000, z_scores = TRUE) %>%
    t()
colnames(PathwayActivity_zscore) <- "NES"

PathwayActivity_zscore_df <- as.data.frame(PathwayActivity_zscore) %>% 
    rownames_to_column(var = "Pathway") %>%
    dplyr::arrange(NES) %>%
    dplyr::mutate(Pathway = factor(Pathway))

ggplot(PathwayActivity_zscore_df,aes(x = reorder(Pathway, NES), y = NES)) + 
    geom_bar(aes(fill = NES), stat = "identity") +
    scale_fill_gradient2(low = "darkblue", high = "indianred", 
        mid = "whitesmoke", midpoint = 0) + 
    theme_minimal() +
    theme(axis.title = element_text(face = "bold", size = 12),
        axis.text.x = 
            element_text(angle = 45, hjust = 1, size =10, face= "bold"),
        axis.text.y = element_text(size =10, face= "bold"),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank()) +
    xlab("Pathways")

The MAPK pathway is the most active pathway upon the perturbation that we are studying (KO of the FOXA2 gene versus the wild-type). We can therefore visualise the MAPK most responsive genes (progeny weights) along with their t_values to interpret the results. In the scatterplot, we can see the genes that are contributing the most to the pathway enrichment.

prog_matrix <- getModel("Human", top=100) %>% 
    as.data.frame()  %>%
    tibble::rownames_to_column("GeneID")

ttop_KOvsWT_df <- ttop_KOvsWT_matrix %>% 
    as.data.frame() %>% 
    tibble::rownames_to_column("GeneID")

scat_plots <- progeny::progenyScatter(df = ttop_KOvsWT_df, 
    weight_matrix = prog_matrix, 
    statName = "t_values", verbose = FALSE)

plot(scat_plots[[1]]$`MAPK`)

Progeny results can be used as an optional input for CARNIVAL. CARNIVAL sets weights based on PROGENy scores in each pathway-related node in order to find more relevant solutions. We therefore run PROGENy again with slightly different parameters, setting z_scores = FALSE so that PROGENy returns pathway activity values between 1 and -1, rather than converting to Z-Scores.

PathwayActivity_CARNIVALinput <- progeny(ttop_KOvsWT_matrix, 
    scale=TRUE, organism="Human", top = 100, perm = 10000, z_scores = FALSE) %>%
    t () %>% 
    as.data.frame() %>% 
    tibble::rownames_to_column(var = "Pathway") 
colnames(PathwayActivity_CARNIVALinput)[2] <- "score"

PathwayActivity_CARNIVALinput can then be used as input for CARNIVAL. ## References

Schubert M, Klinger B, Klünemann M, Sieber A, Uhlitz F, Sauer S, Garnett MJ, Blüthgen N, Saez-Rodriguez J. “Perturbation-response genes reveal signaling footprints in cancer gene expression.” Nature Communications: 10.1038/s41467-017-02391-6

Holland CH, Szalai B, Saez-Rodriguez J. “Transfer of regulatory knowledge from human to mouse for functional genomics analysis.” Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms. 2019. DOI: 10.1016/j.bbagrm.2019.194431.

Holland CH, Tanevski J, Perales-Patón J, Gleixner J, Kumar MP, Mereu E, Joughin BA, Stegle O, Lauffenburger DA, Heyn H, Szalai B, Saez-Rodriguez, J. “Robustness and applicability of transcription factor and pathway analysis tools on single-cell RNA-seq data.” Genome Biology. 2020. DOI: 10.1186/s13059-020-1949-z.

0.4 Session Info Details

## R version 4.1.1 (2021-08-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] knitr_1.36         tibble_3.1.5       pheatmap_1.0.12    readr_2.0.2       
##  [5] tidyr_1.1.4        ggplot2_3.3.5      SeuratObject_4.0.2 Seurat_4.0.5      
##  [9] dplyr_1.0.7        progeny_1.16.0     BiocStyle_2.22.0  
## 
## loaded via a namespace (and not attached):
##   [1] Rtsne_0.15            colorspace_2.0-2      deldir_1.0-6         
##   [4] ellipsis_0.3.2        ggridges_0.5.3        spatstat.data_2.1-0  
##   [7] farver_2.1.0          leiden_0.3.9          listenv_0.8.0        
##  [10] ggrepel_0.9.1         RSpectra_0.16-0       fansi_0.5.0          
##  [13] codetools_0.2-18      splines_4.1.1         polyclip_1.10-0      
##  [16] jsonlite_1.7.2        ica_1.0-2             cluster_2.1.2        
##  [19] png_0.1-7             uwot_0.1.10           shiny_1.7.1          
##  [22] sctransform_0.3.2     spatstat.sparse_2.0-0 BiocManager_1.30.16  
##  [25] compiler_4.1.1        httr_1.4.2            assertthat_0.2.1     
##  [28] Matrix_1.3-4          fastmap_1.1.0         lazyeval_0.2.2       
##  [31] limma_3.50.0          later_1.3.0           htmltools_0.5.2      
##  [34] tools_4.1.1           igraph_1.2.7          gtable_0.3.0         
##  [37] glue_1.4.2            RANN_2.6.1            reshape2_1.4.4       
##  [40] Rcpp_1.0.7            scattermore_0.7       jquerylib_0.1.4      
##  [43] vctrs_0.3.8           nlme_3.1-153          lmtest_0.9-38        
##  [46] xfun_0.27             stringr_1.4.0         globals_0.14.0       
##  [49] mime_0.12             miniUI_0.1.1.1        lifecycle_1.0.1      
##  [52] irlba_2.3.3           goftest_1.2-3         future_1.22.1        
##  [55] MASS_7.3-54           zoo_1.8-9             scales_1.1.1         
##  [58] spatstat.core_2.3-0   hms_1.1.1             promises_1.2.0.1     
##  [61] spatstat.utils_2.2-0  parallel_4.1.1        RColorBrewer_1.1-2   
##  [64] yaml_2.2.1            reticulate_1.22       pbapply_1.5-0        
##  [67] gridExtra_2.3         sass_0.4.0            rpart_4.1-15         
##  [70] stringi_1.7.5         highr_0.9             rlang_0.4.12         
##  [73] pkgconfig_2.0.3       matrixStats_0.61.0    evaluate_0.14        
##  [76] lattice_0.20-45       ROCR_1.0-11           purrr_0.3.4          
##  [79] tensor_1.5            labeling_0.4.2        patchwork_1.1.1      
##  [82] htmlwidgets_1.5.4     cowplot_1.1.1         tidyselect_1.1.1     
##  [85] parallelly_1.28.1     RcppAnnoy_0.0.19      plyr_1.8.6           
##  [88] magrittr_2.0.1        bookdown_0.24         R6_2.5.1             
##  [91] magick_2.7.3          generics_0.1.1        DBI_1.1.1            
##  [94] withr_2.4.2           mgcv_1.8-38           pillar_1.6.4         
##  [97] fitdistrplus_1.1-6    survival_3.2-13       abind_1.4-5          
## [100] future.apply_1.8.1    crayon_1.4.1          KernSmooth_2.23-20   
## [103] utf8_1.2.2            spatstat.geom_2.3-0   plotly_4.10.0        
## [106] tzdb_0.1.2            rmarkdown_2.11        grid_4.1.1           
## [109] data.table_1.14.2     digest_0.6.28         xtable_1.8-4         
## [112] httpuv_1.6.3          munsell_0.5.0         viridisLite_0.4.0    
## [115] bslib_0.3.1