decoupleR 1.0.0
decoupleR
R
is an open-source statistical environment which can be easily modified to
enhance its functionality via packages. decoupleR is an R
package available via the Bioconductor repository
for packages. R
can be installed on any operating system from
CRAN after which you can install
decoupleR by using the following commands in your R
session:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("decoupleR")
# Check that you have a valid Bioconductor installation
BiocManager::valid()
You can install the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("saezlab/decoupleR")
decoupleR is based on many other packages and in particular in those that have implemented the infrastructure needed for dealing with functional genomic analysis. That is, packages like viper or GSVA, among others. This in order to have a centralized place from which to apply different statistics to the same data set without the need and work that would require testing in isolation. Opening the possibility of developing benchmarks that can grow easily.
As package developers, we try to explain clearly how to use our packages and in
which order to use the functions. But R
and Bioconductor
have a steep
learning curve so it is critical to learn where to ask for help.
We would like to highlight the
Bioconductor support site as the main
resource for getting help: remember to use the decoupleR
tag and check
the older posts.
Other alternatives are available such as creating GitHub issues and tweeting.
However, please note that if you want to receive help you should adhere to the
posting guidelines.
It is particularly critical that you provide a small reproducible example and
your session information so package developers can track down the source of
the error.
decoupleR
We hope that decoupleR will be useful for your research. Please use the following information to cite the package and the overall approach. Thank you!
citation("decoupleR")
#>
#> saezlab (2021). _Package to decouple gene sets from statistics_. doi:
#> 10.18129/B9.bioc.decoupleR (URL:
#> https://doi.org/10.18129/B9.bioc.decoupleR),
#> https://github.com/saezlab/decoupleR - R package version 1.0.0, <URL:
#> http://www.bioconductor.org/packages/decoupleR>.
#>
#> saezlab (2020). "Package to decouple gene sets from statistics."
#> _bioRxiv_. doi: 10.1101/TODO (URL: https://doi.org/10.1101/TODO), <URL:
#> https://www.biorxiv.org/content/10.1101/TODO>.
#>
#> To see these entries in BibTeX format, use 'print(<citation>,
#> bibtex=TRUE)', 'toBibtex(.)', or set
#> 'options(citation.bibtex.max=999)'.
decoupleR
decoupleR provides different statistics to calculate the
regulatory activity given an expression matrix
and a network
.
It incorporates pre-existing methods to avoid recreating the wheel while
implementing its own methods under an evaluation standard.
Therefore, it provides flexibility when evaluating a data set with different
statistics.
Since inputs and outputs are always tibbles (i.e. special data frames), incorporating dplyr into your workflow can be useful for manipulating results, but it is not necessary.
library(decoupleR)
library(dplyr)
In order to use it, you first need to have a matrix
where the rows represent
the target nodes and the columns the different conditions in which they were
evaluated. In addition, it is necessary to provide a network
that contains at
least two columns corresponding to the source and target nodes. It is noteworthy
that certain methods will require specifying additional metadata columns. For
instance, the mode of regulation (MoR) or the likelihood of the interaction.
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")
mat <- file.path(inputs_dir, "input-expr_matrix.rds") %>%
readRDS() %>%
glimpse()
#> num [1:18490, 1:4] 3.251 0.283 -2.253 0.782 -4.575 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:18490] "A1BG" "A1CF" "A2M" "A2ML1" ...
#> ..$ : chr [1:4] "GSM2753335" "GSM2753336" "GSM2753337" "GSM2753338"
network <- file.path(inputs_dir, "input-dorothea_genesets.rds") %>%
readRDS() %>%
glimpse()
#> Rows: 151
#> Columns: 5
#> $ tf <chr> "FOXO4", "FOXO4", "FOXO4", "FOXO4", "FOXO4", "FOXO4", "FOXO…
#> $ confidence <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",…
#> $ target <chr> "BCL2L11", "BCL6", "CDKN1A", "CDKN1B", "G6PC", "GADD45A", "…
#> $ mor <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ likelihood <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
Once the data is loaded, you are one step away from achieving decoupling.
This step corresponds to specifying which statistics you want to run.
For more information about the defined statistics and their parameters,
you can execute ?decouple()
.
decouple(
mat = mat,
network = network,
.source = "tf",
.target = "target",
statistics = c("gsva", "mean", "pscira", "scira", "viper", "ora"),
args = list(
gsva = list(verbose = FALSE),
mean = list(.mor = "mor", .likelihood = "likelihood"),
pscira = list(.mor = "mor"),
scira = list(.mor = "mor"),
viper = list(
.mor = "mor",
.likelihood = "likelihood",
verbose = FALSE
),
ora = list()
)
) %>%
glimpse()
#> Rows: 140
#> Columns: 11
#> $ run_id <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"…
#> $ statistic <chr> "gsva", "gsva", "gsva", "gsva", "gsva", "gsva", "gsva", "g…
#> $ tf <chr> "FOXO4", "FOXO4", "FOXO4", "FOXO4", "NFIC", "NFIC", "NFIC"…
#> $ condition <chr> "GSM2753335", "GSM2753336", "GSM2753337", "GSM2753338", "G…
#> $ score <dbl> -0.38043080, -0.29999174, 0.23887789, 0.09071077, -0.08452…
#> $ p_value <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ estimate <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ conf.low <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ conf.high <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ method <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ alternative <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
Done, you have applied different statistics to the same data set, now you can analyze them at your convenience, for example, by performing a benchmark.
Internally, decouple()
works through purrr::map2_dfr()
to perform
statistics and argument mapping. This comes with important points:
statistics
and args
can be vectors of the same length. A vector of length
1 will be recycled. So, match is performed by position not by name.
Using named vectors could be a good idea to make clear your intentions.decouple()
works with expressions that are later
evaluated. For example, it generates a toy call that represents what it is
trying to run. You can show it with the option show_toy_call = TRUE
.show_toy_call = TRUE
and execute it locally. Try to fix it and correct it
on the original call.Based on the previous points, they can take the generated toy calls and execute
them independently, obtaining the same results as if you were executing
decouple()
.
See internal gsva calls and save results.
gsvas_res <- decouple(
mat = head(mat, 5000),
network = network,
.source = "tf",
.target = "target",
statistics = c("gsva"),
args = list(
gsva_default = list(verbose = FALSE),
gsva_minsize = list(verbose = FALSE, ssgsea.norm = FALSE)
),
show_toy_call = TRUE
)
#> run_gsva(mat = head(mat, 5000), network = network, .source = "tf", .target = "target",
#> * verbose = FALSE)
#> run_gsva(mat = head(mat, 5000), network = network, .source = "tf", .target = "target",
#> * verbose = FALSE, ssgsea.norm = FALSE)
Run same calls as provided by setting show_toy_call = TRUE
.
gsva_1 <- run_gsva(
mat = head(mat, 5000),
network = network,
.source = "tf",
.target = "target",
verbose = FALSE
)
gsva_2 <- run_gsva(
mat = head(mat, 5000),
network = network,
.source = "tf",
.target = "target",
verbose = FALSE,
ssgsea.norm = FALSE
)
gsvas_res_2 <- bind_rows(gsva_1, gsva_2, .id = "run_id")
Now compare results and see there is not difference.
all.equal(gsvas_res, gsvas_res_2)
#> [1] TRUE
To carry out the column mapping, decoupleR
relies on the selection provided
by the tidyselect package.
Some of the selection it provides are:
Let’s see an example. Input network has the following columns:
network %>%
colnames()
#> [1] "tf" "confidence" "target" "mor" "likelihood"
We can use the way we like to do the mapping, even a combination of ways to do
it. This applies not only to the decouple function, but to all functions of
the decoupleR statistics
family, identifiable by the run_
prefix.
this_column <- "target"
viper_res <- decouple(
mat = mat,
network = network,
.source = tf,
.target = !!this_column,
statistics = c("viper"),
args = list(
viper = list(
.mor = 4,
.likelihood = "likelihood",
verbose = FALSE
)
),
show_toy_call = TRUE
)
#> run_viper(mat = mat, network = network, .source = tf, .target = "target",
#> * .mor = 4, .likelihood = "likelihood", verbose = FALSE)
The decouopleR package (saezlab, 2021) was made possible thanks to:
This package was developed using biocthis.
# Create the vignette
library(rmarkdown)
system.time(render("decoupleR.Rmd", "BiocStyle::html_document"))
# Extract the R code
library(knitr)
knit("decoupleR.Rmd", tangle = TRUE)
#> Time difference of 14.591 secs
#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.1.0 (2021-05-18)
#> os Ubuntu 20.04.2 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate C
#> ctype en_US.UTF-8
#> tz America/New_York
#> date 2021-05-19
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#> package * version date lib source
#> annotate 1.70.0 2021-05-19 [2] Bioconductor
#> AnnotationDbi 1.54.0 2021-05-19 [2] Bioconductor
#> assertthat 0.2.1 2019-03-21 [2] CRAN (R 4.1.0)
#> backports 1.2.1 2020-12-09 [2] CRAN (R 4.1.0)
#> beachmat 2.8.0 2021-05-19 [2] Bioconductor
#> Biobase 2.52.0 2021-05-19 [2] Bioconductor
#> BiocGenerics 0.38.0 2021-05-19 [2] Bioconductor
#> BiocManager 1.30.15 2021-05-11 [2] CRAN (R 4.1.0)
#> BiocParallel 1.26.0 2021-05-19 [2] Bioconductor
#> BiocSingular 1.8.0 2021-05-19 [2] Bioconductor
#> BiocStyle * 2.20.0 2021-05-19 [2] Bioconductor
#> Biostrings 2.60.0 2021-05-19 [2] Bioconductor
#> bit 4.0.4 2020-08-04 [2] CRAN (R 4.1.0)
#> bit64 4.0.5 2020-08-30 [2] CRAN (R 4.1.0)
#> bitops 1.0-7 2021-04-24 [2] CRAN (R 4.1.0)
#> blob 1.2.1 2020-01-20 [2] CRAN (R 4.1.0)
#> bookdown 0.22 2021-04-22 [2] CRAN (R 4.1.0)
#> broom 0.7.6 2021-04-05 [2] CRAN (R 4.1.0)
#> bslib 0.2.5.1 2021-05-18 [2] CRAN (R 4.1.0)
#> cachem 1.0.5 2021-05-15 [2] CRAN (R 4.1.0)
#> class 7.3-19 2021-05-03 [2] CRAN (R 4.1.0)
#> cli 2.5.0 2021-04-26 [2] CRAN (R 4.1.0)
#> crayon 1.4.1 2021-02-08 [2] CRAN (R 4.1.0)
#> DBI 1.1.1 2021-01-15 [2] CRAN (R 4.1.0)
#> decoupleR * 1.0.0 2021-05-19 [1] Bioconductor
#> DelayedArray 0.18.0 2021-05-19 [2] Bioconductor
#> DelayedMatrixStats 1.14.0 2021-05-19 [2] Bioconductor
#> digest 0.6.27 2020-10-24 [2] CRAN (R 4.1.0)
#> dplyr * 1.0.6 2021-05-05 [2] CRAN (R 4.1.0)
#> e1071 1.7-6 2021-03-18 [2] CRAN (R 4.1.0)
#> ellipsis 0.3.2 2021-04-29 [2] CRAN (R 4.1.0)
#> evaluate 0.14 2019-05-28 [2] CRAN (R 4.1.0)
#> fansi 0.4.2 2021-01-15 [2] CRAN (R 4.1.0)
#> fastmap 1.1.0 2021-01-25 [2] CRAN (R 4.1.0)
#> generics 0.1.0 2020-10-31 [2] CRAN (R 4.1.0)
#> GenomeInfoDb 1.28.0 2021-05-19 [2] Bioconductor
#> GenomeInfoDbData 1.2.6 2021-05-19 [2] Bioconductor
#> GenomicRanges 1.44.0 2021-05-19 [2] Bioconductor
#> glue 1.4.2 2020-08-27 [2] CRAN (R 4.1.0)
#> graph 1.70.0 2021-05-19 [2] Bioconductor
#> GSEABase 1.54.0 2021-05-19 [2] Bioconductor
#> GSVA 1.40.0 2021-05-19 [2] Bioconductor
#> HDF5Array 1.20.0 2021-05-19 [2] Bioconductor
#> htmltools 0.5.1.1 2021-01-22 [2] CRAN (R 4.1.0)
#> httr 1.4.2 2020-07-20 [2] CRAN (R 4.1.0)
#> IRanges 2.26.0 2021-05-19 [2] Bioconductor
#> irlba 2.3.3 2019-02-05 [2] CRAN (R 4.1.0)
#> jquerylib 0.1.4 2021-04-26 [2] CRAN (R 4.1.0)
#> jsonlite 1.7.2 2020-12-09 [2] CRAN (R 4.1.0)
#> KEGGREST 1.32.0 2021-05-19 [2] Bioconductor
#> kernlab 0.9-29 2019-11-12 [2] CRAN (R 4.1.0)
#> KernSmooth 2.23-20 2021-05-03 [2] CRAN (R 4.1.0)
#> knitr 1.33 2021-04-24 [2] CRAN (R 4.1.0)
#> lattice 0.20-44 2021-05-02 [2] CRAN (R 4.1.0)
#> lifecycle 1.0.0 2021-02-15 [2] CRAN (R 4.1.0)
#> lubridate 1.7.10 2021-02-26 [2] CRAN (R 4.1.0)
#> magrittr 2.0.1 2020-11-17 [2] CRAN (R 4.1.0)
#> MASS 7.3-54 2021-05-03 [2] CRAN (R 4.1.0)
#> Matrix 1.3-3 2021-05-04 [2] CRAN (R 4.1.0)
#> MatrixGenerics 1.4.0 2021-05-19 [2] Bioconductor
#> matrixStats 0.58.0 2021-01-29 [2] CRAN (R 4.1.0)
#> memoise 2.0.0 2021-01-26 [2] CRAN (R 4.1.0)
#> mixtools 1.2.0 2020-02-07 [2] CRAN (R 4.1.0)
#> pillar 1.6.1 2021-05-16 [2] CRAN (R 4.1.0)
#> pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.1.0)
#> plyr 1.8.6 2020-03-03 [2] CRAN (R 4.1.0)
#> png 0.1-7 2013-12-03 [2] CRAN (R 4.1.0)
#> proxy 0.4-25 2021-03-05 [2] CRAN (R 4.1.0)
#> purrr 0.3.4 2020-04-17 [2] CRAN (R 4.1.0)
#> R6 2.5.0 2020-10-28 [2] CRAN (R 4.1.0)
#> Rcpp 1.0.6 2021-01-15 [2] CRAN (R 4.1.0)
#> RCurl 1.98-1.3 2021-03-16 [2] CRAN (R 4.1.0)
#> RefManageR * 1.3.0 2020-11-13 [2] CRAN (R 4.1.0)
#> rhdf5 2.36.0 2021-05-19 [2] Bioconductor
#> rhdf5filters 1.4.0 2021-05-19 [2] Bioconductor
#> Rhdf5lib 1.14.0 2021-05-19 [2] Bioconductor
#> rlang 0.4.11 2021-04-30 [2] CRAN (R 4.1.0)
#> rmarkdown 2.8 2021-05-07 [2] CRAN (R 4.1.0)
#> RSQLite 2.2.7 2021-04-22 [2] CRAN (R 4.1.0)
#> rstudioapi 0.13 2020-11-12 [2] CRAN (R 4.1.0)
#> rsvd 1.0.5 2021-04-16 [2] CRAN (R 4.1.0)
#> S4Vectors 0.30.0 2021-05-19 [2] Bioconductor
#> sass 0.4.0 2021-05-12 [2] CRAN (R 4.1.0)
#> ScaledMatrix 1.0.0 2021-05-19 [2] Bioconductor
#> segmented 1.3-4 2021-04-22 [2] CRAN (R 4.1.0)
#> sessioninfo 1.1.1 2018-11-05 [2] CRAN (R 4.1.0)
#> SingleCellExperiment 1.14.0 2021-05-19 [2] Bioconductor
#> sparseMatrixStats 1.4.0 2021-05-19 [2] Bioconductor
#> speedglm 0.3-3 2021-01-08 [2] CRAN (R 4.1.0)
#> stringi 1.6.2 2021-05-17 [2] CRAN (R 4.1.0)
#> stringr 1.4.0 2019-02-10 [2] CRAN (R 4.1.0)
#> SummarizedExperiment 1.22.0 2021-05-19 [2] Bioconductor
#> survival 3.2-11 2021-04-26 [2] CRAN (R 4.1.0)
#> tibble 3.1.2 2021-05-16 [2] CRAN (R 4.1.0)
#> tidyr 1.1.3 2021-03-03 [2] CRAN (R 4.1.0)
#> tidyselect 1.1.1 2021-04-30 [2] CRAN (R 4.1.0)
#> utf8 1.2.1 2021-03-12 [2] CRAN (R 4.1.0)
#> vctrs 0.3.8 2021-04-29 [2] CRAN (R 4.1.0)
#> viper 1.26.0 2021-05-19 [2] Bioconductor
#> withr 2.4.2 2021-04-18 [2] CRAN (R 4.1.0)
#> xfun 0.23 2021-05-15 [2] CRAN (R 4.1.0)
#> XML 3.99-0.6 2021-03-16 [2] CRAN (R 4.1.0)
#> xml2 1.3.2 2020-04-23 [2] CRAN (R 4.1.0)
#> xtable 1.8-4 2019-04-21 [2] CRAN (R 4.1.0)
#> XVector 0.32.0 2021-05-19 [2] Bioconductor
#> yaml 2.2.1 2020-02-01 [2] CRAN (R 4.1.0)
#> zlibbioc 1.38.0 2021-05-19 [2] Bioconductor
#>
#> [1] /tmp/RtmpwTRadi/Rinst2f61c33e45600
#> [2] /home/biocbuild/bbs-3.13-bioc/R/library
This vignette was generated using BiocStyle (Oleś, 2021) with knitr (Xie, 2021) and rmarkdown (Allaire, Xie, McPherson, et al., 2021) running behind the scenes.
Citations made with RefManageR (McLean, 2017).
[1] J. Allaire, Y. Xie, J. McPherson, et al. rmarkdown: Dynamic Documents for R. R package version 2.8. 2021. URL: https://github.com/rstudio/rmarkdown.
[2] M. J. Alvarez, Y. Shen, F. M. Giorgi, et al. “Functional characterization of somatic mutations in cancer using network-based inference of protein activity”. In: Nature genetics 48.8 (2016), pp. 838–47.
[3] G. Csárdi, R. core, H. Wickham, et al. sessioninfo: R Session Information. R package version 1.1.1. 2018. URL: https://CRAN.R-project.org/package=sessioninfo.
[4] S. Hänzelmann, R. Castelo, and J. Guinney. “GSVA: gene set variation analysis for microarray and RNA-Seq data”. In: BMC Bioinformatics 14 (2013), p. 7. DOI: 10.1186/1471-2105-14-7. URL: http://www.biomedcentral.com/1471-2105/14/7.
[5] M. W. McLean. “RefManageR: Import and Manage BibTeX and BibLaTeX References in R”. In: The Journal of Open Source Software (2017). DOI: 10.21105/joss.00338.
[6] A. Oleś. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.20.0. 2021. URL: https://github.com/Bioconductor/BiocStyle.
[7] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2021. URL: https://www.R-project.org/.
[8] H. Wickham. “testthat: Get Started with Testing”. In: The R Journal 3 (2011), pp. 5–10. URL: https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.
[9] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.33. 2021. URL: https://yihui.org/knitr/.
[10] saezlab. Package to decouple gene sets from statistics. https://github.com/saezlab/decoupleR - R package version 1.0.0. 2021. DOI: 10.18129/B9.bioc.decoupleR. URL: http://www.bioconductor.org/packages/decoupleR.