1 Basics

1.1 Install decoupleR

R is an open-source statistical environment which can be easily modified to enhance its functionality via packages. decoupleR is an R package available via the Bioconductor repository for packages. R can be installed on any operating system from CRAN after which you can install decoupleR by using the following commands in your R session:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("decoupleR")

# Check that you have a valid Bioconductor installation
BiocManager::valid()

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("saezlab/decoupleR")

1.2 Required knowledge

decoupleR is based on many other packages and in particular in those that have implemented the infrastructure needed for dealing with functional genomic analysis. That is, packages like viper or GSVA, among others. This in order to have a centralized place from which to apply different statistics to the same data set without the need and work that would require testing in isolation. Opening the possibility of developing benchmarks that can grow easily.

1.3 Asking for help

As package developers, we try to explain clearly how to use our packages and in which order to use the functions. But R and Bioconductor have a steep learning curve so it is critical to learn where to ask for help. We would like to highlight the Bioconductor support site as the main resource for getting help: remember to use the decoupleR tag and check the older posts. Other alternatives are available such as creating GitHub issues and tweeting. However, please note that if you want to receive help you should adhere to the posting guidelines. It is particularly critical that you provide a small reproducible example and your session information so package developers can track down the source of the error.

1.4 Citing decoupleR

We hope that decoupleR will be useful for your research. Please use the following information to cite the package and the overall approach. Thank you!

citation("decoupleR")
#> 
#> saezlab (2021). _Package to decouple gene sets from statistics_. doi:
#> 10.18129/B9.bioc.decoupleR (URL:
#> https://doi.org/10.18129/B9.bioc.decoupleR),
#> https://github.com/saezlab/decoupleR - R package version 1.0.0, <URL:
#> http://www.bioconductor.org/packages/decoupleR>.
#> 
#> saezlab (2020). "Package to decouple gene sets from statistics."
#> _bioRxiv_. doi: 10.1101/TODO (URL: https://doi.org/10.1101/TODO), <URL:
#> https://www.biorxiv.org/content/10.1101/TODO>.
#> 
#> To see these entries in BibTeX format, use 'print(<citation>,
#> bibtex=TRUE)', 'toBibtex(.)', or set
#> 'options(citation.bibtex.max=999)'.

2 Quick start to using to decoupleR

2.1 Libraries

decoupleR provides different statistics to calculate the regulatory activity given an expression matrix and a network. It incorporates pre-existing methods to avoid recreating the wheel while implementing its own methods under an evaluation standard. Therefore, it provides flexibility when evaluating a data set with different statistics.

Since inputs and outputs are always tibbles (i.e. special data frames), incorporating dplyr into your workflow can be useful for manipulating results, but it is not necessary.

library(decoupleR)
library(dplyr)

2.2 Input data

In order to use it, you first need to have a matrix where the rows represent the target nodes and the columns the different conditions in which they were evaluated. In addition, it is necessary to provide a network that contains at least two columns corresponding to the source and target nodes. It is noteworthy that certain methods will require specifying additional metadata columns. For instance, the mode of regulation (MoR) or the likelihood of the interaction.

inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- file.path(inputs_dir, "input-expr_matrix.rds") %>%
    readRDS() %>%
    glimpse()
#>  num [1:18490, 1:4] 3.251 0.283 -2.253 0.782 -4.575 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:18490] "A1BG" "A1CF" "A2M" "A2ML1" ...
#>   ..$ : chr [1:4] "GSM2753335" "GSM2753336" "GSM2753337" "GSM2753338"

network <- file.path(inputs_dir, "input-dorothea_genesets.rds") %>%
    readRDS() %>%
    glimpse()
#> Rows: 151
#> Columns: 5
#> $ tf         <chr> "FOXO4", "FOXO4", "FOXO4", "FOXO4", "FOXO4", "FOXO4", "FOXO…
#> $ confidence <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",…
#> $ target     <chr> "BCL2L11", "BCL6", "CDKN1A", "CDKN1B", "G6PC", "GADD45A", "…
#> $ mor        <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ likelihood <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…

2.3 How to decouple?

Once the data is loaded, you are one step away from achieving decoupling. This step corresponds to specifying which statistics you want to run. For more information about the defined statistics and their parameters, you can execute ?decouple().

decouple(
    mat = mat,
    network = network,
    .source = "tf",
    .target = "target",
    statistics = c("gsva", "mean", "pscira", "scira", "viper", "ora"),
    args = list(
        gsva = list(verbose = FALSE),
        mean = list(.mor = "mor", .likelihood = "likelihood"),
        pscira = list(.mor = "mor"),
        scira = list(.mor = "mor"),
        viper = list(
            .mor = "mor",
            .likelihood = "likelihood",
            verbose = FALSE
        ),
        ora = list()
    )
) %>%
    glimpse()
#> Rows: 140
#> Columns: 11
#> $ run_id      <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"…
#> $ statistic   <chr> "gsva", "gsva", "gsva", "gsva", "gsva", "gsva", "gsva", "g…
#> $ tf          <chr> "FOXO4", "FOXO4", "FOXO4", "FOXO4", "NFIC", "NFIC", "NFIC"…
#> $ condition   <chr> "GSM2753335", "GSM2753336", "GSM2753337", "GSM2753338", "G…
#> $ score       <dbl> -0.38043080, -0.29999174, 0.23887789, 0.09071077, -0.08452…
#> $ p_value     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ estimate    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ conf.low    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ conf.high   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ method      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ alternative <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…

Done, you have applied different statistics to the same data set, now you can analyze them at your convenience, for example, by performing a benchmark.

2.4 How it works?

2.4.1 Mapping statistics with arguments

Internally, decouple() works through purrr::map2_dfr() to perform statistics and argument mapping. This comes with important points:

  • statistics and args can be vectors of the same length. A vector of length 1 will be recycled. So, match is performed by position not by name. Using named vectors could be a good idea to make clear your intentions.
  • You will lose track of which statistic you are running on a certain problem. To get around this, decouple() works with expressions that are later evaluated. For example, it generates a toy call that represents what it is trying to run. You can show it with the option show_toy_call = TRUE.
  • If an error occurs, copy the last line that was displayed with show_toy_call = TRUE and execute it locally. Try to fix it and correct it on the original call.

Based on the previous points, they can take the generated toy calls and execute them independently, obtaining the same results as if you were executing decouple().

See internal gsva calls and save results.

gsvas_res <- decouple(
    mat = head(mat, 5000),
    network = network,
    .source = "tf",
    .target = "target",
    statistics = c("gsva"),
    args = list(
        gsva_default = list(verbose = FALSE),
        gsva_minsize = list(verbose = FALSE, ssgsea.norm = FALSE)
    ),
    show_toy_call = TRUE
)
#> run_gsva(mat = head(mat, 5000), network = network, .source = "tf", .target = "target",
#> * verbose = FALSE)
#> run_gsva(mat = head(mat, 5000), network = network, .source = "tf", .target = "target",
#> * verbose = FALSE, ssgsea.norm = FALSE)

Run same calls as provided by setting show_toy_call = TRUE.

gsva_1 <- run_gsva(
    mat = head(mat, 5000),
    network = network,
    .source = "tf",
    .target = "target",
    verbose = FALSE
)

gsva_2 <- run_gsva(
    mat = head(mat, 5000),
    network = network,
    .source = "tf",
    .target = "target",
    verbose = FALSE,
    ssgsea.norm = FALSE
)

gsvas_res_2 <- bind_rows(gsva_1, gsva_2, .id = "run_id")

Now compare results and see there is not difference.

all.equal(gsvas_res, gsvas_res_2)
#> [1] TRUE

2.4.2 Mapping network columns

To carry out the column mapping, decoupleR relies on the selection provided by the tidyselect package. Some of the selection it provides are:

  • Symbols
  • Strings
  • Position

Let’s see an example. Input network has the following columns:

network %>%
    colnames()
#> [1] "tf"         "confidence" "target"     "mor"        "likelihood"

We can use the way we like to do the mapping, even a combination of ways to do it. This applies not only to the decouple function, but to all functions of the decoupleR statistics family, identifiable by the run_ prefix.

this_column <- "target"
viper_res <- decouple(
    mat = mat,
    network = network,
    .source = tf,
    .target = !!this_column,
    statistics = c("viper"),
    args = list(
        viper = list(
            .mor = 4,
            .likelihood = "likelihood",
            verbose = FALSE
        )
    ),
    show_toy_call = TRUE
)
#> run_viper(mat = mat, network = network, .source = tf, .target = "target",
#> *   .mor = 4, .likelihood = "likelihood", verbose = FALSE)

3 Reproducibility

3.1 Special thanks

The decouopleR package (saezlab, 2021) was made possible thanks to:

  • R (R Core Team, 2021)
  • BiocStyle (Oleś, 2021)
  • knitcitations
  • knitr (Xie, 2021)
  • rmarkdown (Allaire, Xie, McPherson, Luraschi, Ushey, Atkins, Wickham, Cheng, Chang, and Iannone, 2021)
  • sessioninfo (Csárdi, core, Wickham, Chang, Flight, Müller, and Hester, 2018)
  • testthat (Wickham, 2011)
  • GSVA (H{ä}nzelmann, Castelo, and Guinney, 2013)
  • viper (Alvarez, Shen, Giorgi, Lachmann, Ding, Ye, and Califano, 2016)

This package was developed using biocthis.

3.2 Vignette

3.2.1 Create

# Create the vignette
library(rmarkdown)
system.time(render("decoupleR.Rmd", "BiocStyle::html_document"))

# Extract the R code
library(knitr)
knit("decoupleR.Rmd", tangle = TRUE)

3.2.2 Wallclock time spent generating the vignette

#> Time difference of 14.591 secs

3.3 Session information

#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.1.0 (2021-05-18)
#>  os       Ubuntu 20.04.2 LTS          
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  C                           
#>  ctype    en_US.UTF-8                 
#>  tz       America/New_York            
#>  date     2021-05-19                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  package              * version  date       lib source        
#>  annotate               1.70.0   2021-05-19 [2] Bioconductor  
#>  AnnotationDbi          1.54.0   2021-05-19 [2] Bioconductor  
#>  assertthat             0.2.1    2019-03-21 [2] CRAN (R 4.1.0)
#>  backports              1.2.1    2020-12-09 [2] CRAN (R 4.1.0)
#>  beachmat               2.8.0    2021-05-19 [2] Bioconductor  
#>  Biobase                2.52.0   2021-05-19 [2] Bioconductor  
#>  BiocGenerics           0.38.0   2021-05-19 [2] Bioconductor  
#>  BiocManager            1.30.15  2021-05-11 [2] CRAN (R 4.1.0)
#>  BiocParallel           1.26.0   2021-05-19 [2] Bioconductor  
#>  BiocSingular           1.8.0    2021-05-19 [2] Bioconductor  
#>  BiocStyle            * 2.20.0   2021-05-19 [2] Bioconductor  
#>  Biostrings             2.60.0   2021-05-19 [2] Bioconductor  
#>  bit                    4.0.4    2020-08-04 [2] CRAN (R 4.1.0)
#>  bit64                  4.0.5    2020-08-30 [2] CRAN (R 4.1.0)
#>  bitops                 1.0-7    2021-04-24 [2] CRAN (R 4.1.0)
#>  blob                   1.2.1    2020-01-20 [2] CRAN (R 4.1.0)
#>  bookdown               0.22     2021-04-22 [2] CRAN (R 4.1.0)
#>  broom                  0.7.6    2021-04-05 [2] CRAN (R 4.1.0)
#>  bslib                  0.2.5.1  2021-05-18 [2] CRAN (R 4.1.0)
#>  cachem                 1.0.5    2021-05-15 [2] CRAN (R 4.1.0)
#>  class                  7.3-19   2021-05-03 [2] CRAN (R 4.1.0)
#>  cli                    2.5.0    2021-04-26 [2] CRAN (R 4.1.0)
#>  crayon                 1.4.1    2021-02-08 [2] CRAN (R 4.1.0)
#>  DBI                    1.1.1    2021-01-15 [2] CRAN (R 4.1.0)
#>  decoupleR            * 1.0.0    2021-05-19 [1] Bioconductor  
#>  DelayedArray           0.18.0   2021-05-19 [2] Bioconductor  
#>  DelayedMatrixStats     1.14.0   2021-05-19 [2] Bioconductor  
#>  digest                 0.6.27   2020-10-24 [2] CRAN (R 4.1.0)
#>  dplyr                * 1.0.6    2021-05-05 [2] CRAN (R 4.1.0)
#>  e1071                  1.7-6    2021-03-18 [2] CRAN (R 4.1.0)
#>  ellipsis               0.3.2    2021-04-29 [2] CRAN (R 4.1.0)
#>  evaluate               0.14     2019-05-28 [2] CRAN (R 4.1.0)
#>  fansi                  0.4.2    2021-01-15 [2] CRAN (R 4.1.0)
#>  fastmap                1.1.0    2021-01-25 [2] CRAN (R 4.1.0)
#>  generics               0.1.0    2020-10-31 [2] CRAN (R 4.1.0)
#>  GenomeInfoDb           1.28.0   2021-05-19 [2] Bioconductor  
#>  GenomeInfoDbData       1.2.6    2021-05-19 [2] Bioconductor  
#>  GenomicRanges          1.44.0   2021-05-19 [2] Bioconductor  
#>  glue                   1.4.2    2020-08-27 [2] CRAN (R 4.1.0)
#>  graph                  1.70.0   2021-05-19 [2] Bioconductor  
#>  GSEABase               1.54.0   2021-05-19 [2] Bioconductor  
#>  GSVA                   1.40.0   2021-05-19 [2] Bioconductor  
#>  HDF5Array              1.20.0   2021-05-19 [2] Bioconductor  
#>  htmltools              0.5.1.1  2021-01-22 [2] CRAN (R 4.1.0)
#>  httr                   1.4.2    2020-07-20 [2] CRAN (R 4.1.0)
#>  IRanges                2.26.0   2021-05-19 [2] Bioconductor  
#>  irlba                  2.3.3    2019-02-05 [2] CRAN (R 4.1.0)
#>  jquerylib              0.1.4    2021-04-26 [2] CRAN (R 4.1.0)
#>  jsonlite               1.7.2    2020-12-09 [2] CRAN (R 4.1.0)
#>  KEGGREST               1.32.0   2021-05-19 [2] Bioconductor  
#>  kernlab                0.9-29   2019-11-12 [2] CRAN (R 4.1.0)
#>  KernSmooth             2.23-20  2021-05-03 [2] CRAN (R 4.1.0)
#>  knitr                  1.33     2021-04-24 [2] CRAN (R 4.1.0)
#>  lattice                0.20-44  2021-05-02 [2] CRAN (R 4.1.0)
#>  lifecycle              1.0.0    2021-02-15 [2] CRAN (R 4.1.0)
#>  lubridate              1.7.10   2021-02-26 [2] CRAN (R 4.1.0)
#>  magrittr               2.0.1    2020-11-17 [2] CRAN (R 4.1.0)
#>  MASS                   7.3-54   2021-05-03 [2] CRAN (R 4.1.0)
#>  Matrix                 1.3-3    2021-05-04 [2] CRAN (R 4.1.0)
#>  MatrixGenerics         1.4.0    2021-05-19 [2] Bioconductor  
#>  matrixStats            0.58.0   2021-01-29 [2] CRAN (R 4.1.0)
#>  memoise                2.0.0    2021-01-26 [2] CRAN (R 4.1.0)
#>  mixtools               1.2.0    2020-02-07 [2] CRAN (R 4.1.0)
#>  pillar                 1.6.1    2021-05-16 [2] CRAN (R 4.1.0)
#>  pkgconfig              2.0.3    2019-09-22 [2] CRAN (R 4.1.0)
#>  plyr                   1.8.6    2020-03-03 [2] CRAN (R 4.1.0)
#>  png                    0.1-7    2013-12-03 [2] CRAN (R 4.1.0)
#>  proxy                  0.4-25   2021-03-05 [2] CRAN (R 4.1.0)
#>  purrr                  0.3.4    2020-04-17 [2] CRAN (R 4.1.0)
#>  R6                     2.5.0    2020-10-28 [2] CRAN (R 4.1.0)
#>  Rcpp                   1.0.6    2021-01-15 [2] CRAN (R 4.1.0)
#>  RCurl                  1.98-1.3 2021-03-16 [2] CRAN (R 4.1.0)
#>  RefManageR           * 1.3.0    2020-11-13 [2] CRAN (R 4.1.0)
#>  rhdf5                  2.36.0   2021-05-19 [2] Bioconductor  
#>  rhdf5filters           1.4.0    2021-05-19 [2] Bioconductor  
#>  Rhdf5lib               1.14.0   2021-05-19 [2] Bioconductor  
#>  rlang                  0.4.11   2021-04-30 [2] CRAN (R 4.1.0)
#>  rmarkdown              2.8      2021-05-07 [2] CRAN (R 4.1.0)
#>  RSQLite                2.2.7    2021-04-22 [2] CRAN (R 4.1.0)
#>  rstudioapi             0.13     2020-11-12 [2] CRAN (R 4.1.0)
#>  rsvd                   1.0.5    2021-04-16 [2] CRAN (R 4.1.0)
#>  S4Vectors              0.30.0   2021-05-19 [2] Bioconductor  
#>  sass                   0.4.0    2021-05-12 [2] CRAN (R 4.1.0)
#>  ScaledMatrix           1.0.0    2021-05-19 [2] Bioconductor  
#>  segmented              1.3-4    2021-04-22 [2] CRAN (R 4.1.0)
#>  sessioninfo            1.1.1    2018-11-05 [2] CRAN (R 4.1.0)
#>  SingleCellExperiment   1.14.0   2021-05-19 [2] Bioconductor  
#>  sparseMatrixStats      1.4.0    2021-05-19 [2] Bioconductor  
#>  speedglm               0.3-3    2021-01-08 [2] CRAN (R 4.1.0)
#>  stringi                1.6.2    2021-05-17 [2] CRAN (R 4.1.0)
#>  stringr                1.4.0    2019-02-10 [2] CRAN (R 4.1.0)
#>  SummarizedExperiment   1.22.0   2021-05-19 [2] Bioconductor  
#>  survival               3.2-11   2021-04-26 [2] CRAN (R 4.1.0)
#>  tibble                 3.1.2    2021-05-16 [2] CRAN (R 4.1.0)
#>  tidyr                  1.1.3    2021-03-03 [2] CRAN (R 4.1.0)
#>  tidyselect             1.1.1    2021-04-30 [2] CRAN (R 4.1.0)
#>  utf8                   1.2.1    2021-03-12 [2] CRAN (R 4.1.0)
#>  vctrs                  0.3.8    2021-04-29 [2] CRAN (R 4.1.0)
#>  viper                  1.26.0   2021-05-19 [2] Bioconductor  
#>  withr                  2.4.2    2021-04-18 [2] CRAN (R 4.1.0)
#>  xfun                   0.23     2021-05-15 [2] CRAN (R 4.1.0)
#>  XML                    3.99-0.6 2021-03-16 [2] CRAN (R 4.1.0)
#>  xml2                   1.3.2    2020-04-23 [2] CRAN (R 4.1.0)
#>  xtable                 1.8-4    2019-04-21 [2] CRAN (R 4.1.0)
#>  XVector                0.32.0   2021-05-19 [2] Bioconductor  
#>  yaml                   2.2.1    2020-02-01 [2] CRAN (R 4.1.0)
#>  zlibbioc               1.38.0   2021-05-19 [2] Bioconductor  
#> 
#> [1] /tmp/RtmpwTRadi/Rinst2f61c33e45600
#> [2] /home/biocbuild/bbs-3.13-bioc/R/library

4 Bibliography

This vignette was generated using BiocStyle (Oleś, 2021) with knitr (Xie, 2021) and rmarkdown (Allaire, Xie, McPherson, et al., 2021) running behind the scenes.

Citations made with RefManageR (McLean, 2017).

[1] J. Allaire, Y. Xie, J. McPherson, et al. rmarkdown: Dynamic Documents for R. R package version 2.8. 2021. URL: https://github.com/rstudio/rmarkdown.

[2] M. J. Alvarez, Y. Shen, F. M. Giorgi, et al. “Functional characterization of somatic mutations in cancer using network-based inference of protein activity”. In: Nature genetics 48.8 (2016), pp. 838–47.

[3] G. Csárdi, R. core, H. Wickham, et al. sessioninfo: R Session Information. R package version 1.1.1. 2018. URL: https://CRAN.R-project.org/package=sessioninfo.

[4] S. Hänzelmann, R. Castelo, and J. Guinney. “GSVA: gene set variation analysis for microarray and RNA-Seq data”. In: BMC Bioinformatics 14 (2013), p. 7. DOI: 10.1186/1471-2105-14-7. URL: http://www.biomedcentral.com/1471-2105/14/7.

[5] M. W. McLean. “RefManageR: Import and Manage BibTeX and BibLaTeX References in R”. In: The Journal of Open Source Software (2017). DOI: 10.21105/joss.00338.

[6] A. Oleś. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.20.0. 2021. URL: https://github.com/Bioconductor/BiocStyle.

[7] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2021. URL: https://www.R-project.org/.

[8] H. Wickham. “testthat: Get Started with Testing”. In: The R Journal 3 (2011), pp. 5–10. URL: https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.

[9] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.33. 2021. URL: https://yihui.org/knitr/.

[10] saezlab. Package to decouple gene sets from statistics. https://github.com/saezlab/decoupleR - R package version 1.0.0. 2021. DOI: 10.18129/B9.bioc.decoupleR. URL: http://www.bioconductor.org/packages/decoupleR.