decoupleR 2.0.0
decoupleR
R
is an open-source statistical environment which can be easily modified to
enhance its functionality via packages. decoupleR is an R
package available via the Bioconductor repository
for packages. R
can be installed on any operating system from
CRAN after which you can install
decoupleR by using the following commands in your R
session:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("decoupleR")
# Check that you have a valid Bioconductor installation
BiocManager::valid()
You can install the development version from GitHub with:
BiocManager::install("saezlab/decoupleR")
decoupleR is based on many other packages and in particular in those that have implemented the infrastructure needed for dealing with functional analysis. That is, packages like viper or GSVA, among others. This in order to have a centralized place from which to apply different statistics to the same data set without the need and work that would require testing in isolation. Opening the possibility of developing benchmarks that can grow easily.
As package developers, we try to explain clearly how to use our packages and in
which order to use the functions. But R
and Bioconductor
have a steep
learning curve so it is critical to learn where to ask for help.
We would like to highlight the
Bioconductor support site as the main
resource for getting help: remember to use the decoupleR
tag and check
the older posts.
Other alternatives are available such as creating GitHub issues and tweeting.
However, please note that if you want to receive help you should adhere to the
posting guidelines.
It is particularly critical that you provide a small reproducible example and
your session information so package developers can track down the source of
the error.
decoupleR
We hope that decoupleR will be useful for your research. Please use the following information to cite the package and the overall approach. Thank you!
citation("decoupleR")
#>
#> To cite decoupleR in publications, please use:
#>
#>
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Article{,
#> author = {Pau Badia-i-Mompel and Jesús Vélez and Jana Braunger and Celina Geiss and Daniel Dimitrov and Sophia Müller-Dott and Petr Taus and Aurelien Dugourd and Christian H. Holland and Ricardo O. Ramirez Flores and Julio Saez-Rodriguez},
#> title = {decoupleR: Inferring biological activities from omics data using a collection of methods},
#> journal = {X},
#> year = {2021},
#> }
decoupleR
decoupleR provides different statistics to calculate the
regulatory activity given a matrix
of molecular readouts and a network
.
It incorporates pre-existing methods to avoid recreating the wheel while
implementing its own methods under an evaluation standard.
Therefore, it provides flexibility when evaluating a data set with different
statistics.
Since inputs and outputs are always tibbles (i.e. special data frames), incorporating dplyr into your workflow can be useful for manipulating results, but it is not necessary.
library(decoupleR)
library(dplyr)
In order to use it, you first need to have a matrix
where the rows represent
the target nodes and the columns the samples. In addition, it is necessary to
provide a prior knowledge network
that contains at least two columns corresponding to the
source and target nodes. It is noteworthy that certain methods will require
specifying additional metadata columns. For instance, the mode of regulation
(MoR) or the likelihood of the interaction.
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")
mat <- file.path(inputs_dir, "input-expr_matrix.rds") %>%
readRDS() %>%
glimpse()
#> num [1:18490, 1:4] 3.251 0.283 -2.253 0.782 -4.575 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:18490] "A1BG" "A1CF" "A2M" "A2ML1" ...
#> ..$ : chr [1:4] "GSM2753335" "GSM2753336" "GSM2753337" "GSM2753338"
network <- file.path(inputs_dir, "input-dorothea_genesets.rds") %>%
readRDS() %>%
glimpse()
#> Rows: 151
#> Columns: 5
#> $ tf <chr> "FOXO4", "FOXO4", "FOXO4", "FOXO4", "FOXO4", "FOXO4", "FOXO…
#> $ confidence <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",…
#> $ target <chr> "BCL2L11", "BCL6", "CDKN1A", "CDKN1B", "G6PC", "GADD45A", "…
#> $ mor <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ likelihood <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
# We recommend to intersect and set a minsize of targets
network <- intersect_regulons(mat, network, tf, target, minsize=5)
Once the data is loaded, you are one step away from achieving decoupling.
This step corresponds to specifying which statistics you want to run.
For more information about the defined statistics and their parameters,
you can execute ?decouple()
.
decouple(
mat = mat,
network = network,
.source = "tf",
.target = "target",
statistics = c("gsva", "wmean", "mlm", "ora"),
args = list(
gsva = list(verbose = FALSE),
wmean = list(times = 100),
mlm = list(center=FALSE),
ora = list(n_up=300, n_bottom=300, n_background=20000)
)
) %>%
glimpse()
#> Rows: 140
#> Columns: 6
#> $ run_id <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ statistic <chr> "gsva", "gsva", "gsva", "gsva", "gsva", "gsva", "gsva", "gsv…
#> $ source <chr> "FOXO4", "FOXO4", "FOXO4", "FOXO4", "NFIC", "NFIC", "NFIC", …
#> $ condition <chr> "GSM2753335", "GSM2753336", "GSM2753337", "GSM2753338", "GSM…
#> $ score <dbl> -0.38043080, -0.29999174, 0.23887789, 0.09071077, -0.0845213…
#> $ p_value <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
Done, you have applied different statistics to the same data set, now you can analyze them at your convenience, for example, by performing a benchmark.
Internally, decouple()
works through purrr::map2_dfr()
to perform
statistics and argument mapping. This comes with important points:
statistics
and args
can be unsorted vectors. decouple
will
match the given statistics
with the given args
, and if a statistic
doesn’t have a matching args
entry, it will run using the default
parameters.show_toy_call = TRUE
and execute it locally. Try to fix it and correct it
on the original call.See internal decouple calls and save results.
only_stats <- decouple(
mat = head(mat, 5000),
network = network,
.source = "tf",
.target = "target",
statistics = c("ora", "ulm", "gsva"),
consensus_score = FALSE,
show_toy_call = TRUE
)
#> run_ora(mat = head(mat, 5000), network = network, .source = "tf", .target = "target")
#> run_ulm(mat = head(mat, 5000), network = network, .source = "tf", .target = "target")
#> run_gsva(mat = head(mat, 5000), network = network, .source = "tf", .target = "target")
Run same but now with an unsorted argument.
add_args <- decouple(
mat = head(mat, 5000),
network = network,
.source = "tf",
.target = "target",
statistics = c('gsva', 'ora', 'ulm'),
args = list(
ora = list(n_up=300, n_bottom=300, n_background=20000),
ulm = list(center = F)
),
consensus_score = FALSE,
show_toy_call = TRUE
)
#> run_ora(mat = head(mat, 5000), network = network, .source = "tf", .target = "target",
#> * n_up = 300, n_bottom = 300, n_background = 20000)
#> run_ulm(mat = head(mat, 5000), network = network, .source = "tf", .target = "target",
#> * center = FALSE)
#> run_gsva(mat = head(mat, 5000), network = network, .source = "tf", .target = "target")
Now compare results and see there is not difference.
all.equal(only_stats, add_args)
#> [1] TRUE
To carry out the column mapping, decoupleR
relies on the selection provided
by the tidyselect package.
Some of the selection it provides are:
Let’s see an example. Input network has the following columns:
network %>%
colnames()
#> [1] "tf" "confidence" "target" "mor" "likelihood"
We can use the way we like to do the mapping, even a combination of ways to do
it. This applies not only to the decouple function, but to all functions of
the decoupleR statistics
family, identifiable by the run_
prefix.
this_column <- "target"
viper_res <- decouple(
mat = mat,
network = network,
.source = tf,
.target = !!this_column,
statistics = c("viper"),
args = list(
viper = list(
.mor = 4,
.likelihood = "likelihood",
verbose = FALSE
)
),
show_toy_call = TRUE
)
#> run_viper(mat = mat, network = network, .source = tf, .target = "target",
#> * .mor = 4, .likelihood = "likelihood", verbose = FALSE)
The decouopleR package (Badia-i-Mompel, Vélez, Braunger, Geiss, Dimitrov, Müller-Dott, Taus, Dugourd, Holland, Flores, and Saez-Rodriguez, 2021) was made possible thanks to:
This package was developed using biocthis.
# Create the vignette
library(rmarkdown)
system.time(render("decoupleR.Rmd", "BiocStyle::html_document"))
# Extract the R code
library(knitr)
knit("decoupleR.Rmd", tangle = TRUE)
#> Time difference of 14.234 secs
#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.1.1 (2021-08-10)
#> os Ubuntu 20.04.3 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate C
#> ctype en_US.UTF-8
#> tz America/New_York
#> date 2021-10-26
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#> package * version date lib source
#> annotate 1.72.0 2021-10-26 [2] Bioconductor
#> AnnotationDbi 1.56.0 2021-10-26 [2] Bioconductor
#> assertthat 0.2.1 2019-03-21 [2] CRAN (R 4.1.1)
#> backports 1.2.1 2020-12-09 [2] CRAN (R 4.1.1)
#> beachmat 2.10.0 2021-10-26 [2] Bioconductor
#> Biobase 2.54.0 2021-10-26 [2] Bioconductor
#> BiocGenerics 0.40.0 2021-10-26 [2] Bioconductor
#> BiocManager 1.30.16 2021-06-15 [2] CRAN (R 4.1.1)
#> BiocParallel 1.28.0 2021-10-26 [2] Bioconductor
#> BiocSingular 1.10.0 2021-10-26 [2] Bioconductor
#> BiocStyle * 2.22.0 2021-10-26 [2] Bioconductor
#> Biostrings 2.62.0 2021-10-26 [2] Bioconductor
#> bit 4.0.4 2020-08-04 [2] CRAN (R 4.1.1)
#> bit64 4.0.5 2020-08-30 [2] CRAN (R 4.1.1)
#> bitops 1.0-7 2021-04-24 [2] CRAN (R 4.1.1)
#> blob 1.2.2 2021-07-23 [2] CRAN (R 4.1.1)
#> bookdown 0.24 2021-09-02 [2] CRAN (R 4.1.1)
#> broom 0.7.9 2021-07-27 [2] CRAN (R 4.1.1)
#> bslib 0.3.1 2021-10-06 [2] CRAN (R 4.1.1)
#> cachem 1.0.6 2021-08-19 [2] CRAN (R 4.1.1)
#> class 7.3-19 2021-05-03 [2] CRAN (R 4.1.1)
#> cli 3.0.1 2021-07-17 [2] CRAN (R 4.1.1)
#> crayon 1.4.1 2021-02-08 [2] CRAN (R 4.1.1)
#> DBI 1.1.1 2021-01-15 [2] CRAN (R 4.1.1)
#> decoupleR * 2.0.0 2021-10-26 [1] Bioconductor
#> DelayedArray 0.20.0 2021-10-26 [2] Bioconductor
#> DelayedMatrixStats 1.16.0 2021-10-26 [2] Bioconductor
#> digest 0.6.28 2021-09-23 [2] CRAN (R 4.1.1)
#> dplyr * 1.0.7 2021-06-18 [2] CRAN (R 4.1.1)
#> e1071 1.7-9 2021-09-16 [2] CRAN (R 4.1.1)
#> ellipsis 0.3.2 2021-04-29 [2] CRAN (R 4.1.1)
#> evaluate 0.14 2019-05-28 [2] CRAN (R 4.1.1)
#> fansi 0.5.0 2021-05-25 [2] CRAN (R 4.1.1)
#> fastmap 1.1.0 2021-01-25 [2] CRAN (R 4.1.1)
#> generics 0.1.1 2021-10-25 [2] CRAN (R 4.1.1)
#> GenomeInfoDb 1.30.0 2021-10-26 [2] Bioconductor
#> GenomeInfoDbData 1.2.7 2021-09-23 [2] Bioconductor
#> GenomicRanges 1.46.0 2021-10-26 [2] Bioconductor
#> glue 1.4.2 2020-08-27 [2] CRAN (R 4.1.1)
#> graph 1.72.0 2021-10-26 [2] Bioconductor
#> GSEABase 1.56.0 2021-10-26 [2] Bioconductor
#> GSVA 1.42.0 2021-10-26 [2] Bioconductor
#> HDF5Array 1.22.0 2021-10-26 [2] Bioconductor
#> htmltools 0.5.2 2021-08-25 [2] CRAN (R 4.1.1)
#> httr 1.4.2 2020-07-20 [2] CRAN (R 4.1.1)
#> IRanges 2.28.0 2021-10-26 [2] Bioconductor
#> irlba 2.3.3 2019-02-05 [2] CRAN (R 4.1.1)
#> jquerylib 0.1.4 2021-04-26 [2] CRAN (R 4.1.1)
#> jsonlite 1.7.2 2020-12-09 [2] CRAN (R 4.1.1)
#> KEGGREST 1.34.0 2021-10-26 [2] Bioconductor
#> kernlab 0.9-29 2019-11-12 [2] CRAN (R 4.1.1)
#> KernSmooth 2.23-20 2021-05-03 [2] CRAN (R 4.1.1)
#> knitr 1.36 2021-09-29 [2] CRAN (R 4.1.1)
#> lattice 0.20-45 2021-09-22 [2] CRAN (R 4.1.1)
#> lifecycle 1.0.1 2021-09-24 [2] CRAN (R 4.1.1)
#> lubridate 1.8.0 2021-10-07 [2] CRAN (R 4.1.1)
#> magrittr 2.0.1 2020-11-17 [2] CRAN (R 4.1.1)
#> MASS 7.3-54 2021-05-03 [2] CRAN (R 4.1.1)
#> Matrix 1.3-4 2021-06-01 [2] CRAN (R 4.1.1)
#> MatrixGenerics 1.6.0 2021-10-26 [2] Bioconductor
#> matrixStats 0.61.0 2021-09-17 [2] CRAN (R 4.1.1)
#> memoise 2.0.0 2021-01-26 [2] CRAN (R 4.1.1)
#> mixtools 1.2.0 2020-02-07 [2] CRAN (R 4.1.1)
#> pillar 1.6.4 2021-10-18 [2] CRAN (R 4.1.1)
#> pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.1.1)
#> plyr 1.8.6 2020-03-03 [2] CRAN (R 4.1.1)
#> png 0.1-7 2013-12-03 [2] CRAN (R 4.1.1)
#> proxy 0.4-26 2021-06-07 [2] CRAN (R 4.1.1)
#> purrr 0.3.4 2020-04-17 [2] CRAN (R 4.1.1)
#> R6 2.5.1 2021-08-19 [2] CRAN (R 4.1.1)
#> ranger 0.13.1 2021-07-14 [2] CRAN (R 4.1.1)
#> Rcpp 1.0.7 2021-07-07 [2] CRAN (R 4.1.1)
#> RCurl 1.98-1.5 2021-09-17 [2] CRAN (R 4.1.1)
#> RefManageR * 1.3.0 2020-11-13 [2] CRAN (R 4.1.1)
#> rhdf5 2.38.0 2021-10-26 [2] Bioconductor
#> rhdf5filters 1.6.0 2021-10-26 [2] Bioconductor
#> Rhdf5lib 1.16.0 2021-10-26 [2] Bioconductor
#> rlang 0.4.12 2021-10-18 [2] CRAN (R 4.1.1)
#> rmarkdown 2.11 2021-09-14 [2] CRAN (R 4.1.1)
#> RobustRankAggreg 1.1 2013-06-03 [2] CRAN (R 4.1.1)
#> rpart 4.1-15 2019-04-12 [2] CRAN (R 4.1.1)
#> RSQLite 2.2.8 2021-08-21 [2] CRAN (R 4.1.1)
#> rsvd 1.0.5 2021-04-16 [2] CRAN (R 4.1.1)
#> S4Vectors 0.32.0 2021-10-26 [2] Bioconductor
#> sass 0.4.0 2021-05-12 [2] CRAN (R 4.1.1)
#> ScaledMatrix 1.2.0 2021-10-26 [2] Bioconductor
#> segmented 1.3-4 2021-04-22 [2] CRAN (R 4.1.1)
#> sessioninfo 1.1.1 2018-11-05 [2] CRAN (R 4.1.1)
#> SingleCellExperiment 1.16.0 2021-10-26 [2] Bioconductor
#> sparseMatrixStats 1.6.0 2021-10-26 [2] Bioconductor
#> speedglm 0.3-3 2021-01-08 [2] CRAN (R 4.1.1)
#> stringi 1.7.5 2021-10-04 [2] CRAN (R 4.1.1)
#> stringr 1.4.0 2019-02-10 [2] CRAN (R 4.1.1)
#> SummarizedExperiment 1.24.0 2021-10-26 [2] Bioconductor
#> survival 3.2-13 2021-08-24 [2] CRAN (R 4.1.1)
#> tibble 3.1.5 2021-09-30 [2] CRAN (R 4.1.1)
#> tidyr 1.1.4 2021-09-27 [2] CRAN (R 4.1.1)
#> tidyselect 1.1.1 2021-04-30 [2] CRAN (R 4.1.1)
#> utf8 1.2.2 2021-07-24 [2] CRAN (R 4.1.1)
#> vctrs 0.3.8 2021-04-29 [2] CRAN (R 4.1.1)
#> viper 1.28.0 2021-10-26 [2] Bioconductor
#> withr 2.4.2 2021-04-18 [2] CRAN (R 4.1.1)
#> xfun 0.27 2021-10-18 [2] CRAN (R 4.1.1)
#> XML 3.99-0.8 2021-09-17 [2] CRAN (R 4.1.1)
#> xml2 1.3.2 2020-04-23 [2] CRAN (R 4.1.1)
#> xtable 1.8-4 2019-04-21 [2] CRAN (R 4.1.1)
#> XVector 0.34.0 2021-10-26 [2] Bioconductor
#> yaml 2.2.1 2020-02-01 [2] CRAN (R 4.1.1)
#> zlibbioc 1.40.0 2021-10-26 [2] Bioconductor
#>
#> [1] /tmp/RtmplYABw5/Rinst3905bc364d2195
#> [2] /home/biocbuild/bbs-3.14-bioc/R/library
This vignette was generated using BiocStyle (Oleś, 2021) with knitr (Xie, 2021) and rmarkdown (Allaire, Xie, McPherson, et al., 2021) running behind the scenes.
Citations made with RefManageR (McLean, 2017).
[1] J. Allaire, Y. Xie, J. McPherson, et al. rmarkdown: Dynamic Documents for R. R package version 2.11. 2021. URL: https://github.com/rstudio/rmarkdown.
[2] M. J. Alvarez, Y. Shen, F. M. Giorgi, et al. “Functional characterization of somatic mutations in cancer using network-based inference of protein activity”. In: Nature genetics 48.8 (2016), pp. 838–47.
[3] P. Badia-i-Mompel, J. Vélez, J. Braunger, et al. “decoupleR: Inferring biological activities from omics data using a collection of methods”. In: X (2021).
[4] G. Csárdi, R. core, H. Wickham, et al. sessioninfo: R Session Information. R package version 1.1.1. 2018. URL: https://CRAN.R-project.org/package=sessioninfo.
[5] S. Hänzelmann, R. Castelo, and J. Guinney. “GSVA: gene set variation analysis for microarray and RNA-Seq data”. In: BMC Bioinformatics 14 (2013), p. 7. DOI: 10.1186/1471-2105-14-7. URL: http://www.biomedcentral.com/1471-2105/14/7.
[6] M. W. McLean. “RefManageR: Import and Manage BibTeX and BibLaTeX References in R”. In: The Journal of Open Source Software (2017). DOI: 10.21105/joss.00338.
[7] A. Oleś. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.22.0. 2021. URL: https://github.com/Bioconductor/BiocStyle.
[8] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2021. URL: https://www.R-project.org/.
[9] H. Wickham. “testthat: Get Started with Testing”. In: The R Journal 3 (2011), pp. 5–10. URL: https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.
[10] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.36. 2021. URL: https://yihui.org/knitr/.