This package implements alternative inference methods for BASiCS. The original package uses adaptive Metropolis within Gibbs sampling, while BASiCStan uses Stan to enable the use of maximum a posteriori estimation, variational inference, and Hamiltonian Monte Carlo. These each have advantages for different use cases.
To use BASiCStan
, we can first use
BASICS_MockSCE()
to generate an toy example dataset. We
will also set a seed for reproducibility.
suppressPackageStartupMessages(library("BASiCStan"))
set.seed(42)
<- BASiCS_MockSCE() sce
The interface for running MCMC using BASiCS and using alternative
inference methods using Stan is very similar. It is worth noting that
the joint prior between mean and over-dispersion parameters,
corresponding to Regression = TRUE
, is the only supported
mode in BASiCStan()
.
<- BASiCS_MCMC(
amwg_fit
sce,N = 200,
Thin = 10,
Burn = 100,
Regression = TRUE
)#> altExp 'spike-ins' is assumed to contain spike-in genes.
#> see help(altExp) for details.
#> Running with spikes BASiCS sampler (regression case) ...
#> -------------------------------------------------------------
#> MCMC running time
#> -------------------------------------------------------------
#> user: 0.347
#> system: 0.001
#> elapsed: 0.349
#> -------------------------------------------------------------
#> Output
#> -------------------------------------------------------------
#> -------------------------------------------------------------
#> BASiCS version 2.10.0 :
#> vertical integration (spikes case)
#> -------------------------------------------------------------
<- BASiCStan(sce, Method = "sampling", iter = 10)
stan_fit #> Warning: There were 5 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems
The output of BASiCStan()
is an object of class
BASiCS_Chain
, similar to the output of
BASiCS_MCMC()
. Therefore, you could use these as you would
an object generated using Metropolis within Gibbs sampling. For example,
we can plot the relationship between mean and over-dispersion estimated
using the joint regression prior:
BASiCS_ShowFit(amwg_fit)
BASiCS_ShowFit(stan_fit)
Using Stan has advantages outside of the variety of inference engines
available. By returning a Stan object that we can later convert to a
BASiCS_Chain
object, we can leverage an even broader set of
functionality. For example, Stan has the ability to easily generate MCMC
diagnostics using simple functions. For example, summary()
outputs a number of useful per-chain and across-chain diagnostics:
<- BASiCStan(sce, ReturnBASiCS = FALSE, Method = "sampling", iter = 10)
stan_obj #>
#> SAMPLING FOR MODEL 'basics_regression' NOW (CHAIN 1).
#> Chain 1:
#> Chain 1: Gradient evaluation took 0.003248 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 32.48 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1:
#> Chain 1:
#> Chain 1: WARNING: No variance estimation is
#> Chain 1: performed for num_warmup < 20
#> Chain 1:
#> Chain 1: Iteration: 1 / 10 [ 10%] (Warmup)
#> Chain 1: Iteration: 2 / 10 [ 20%] (Warmup)
#> Chain 1: Iteration: 3 / 10 [ 30%] (Warmup)
#> Chain 1: Iteration: 4 / 10 [ 40%] (Warmup)
#> Chain 1: Iteration: 5 / 10 [ 50%] (Warmup)
#> Chain 1: Iteration: 6 / 10 [ 60%] (Sampling)
#> Chain 1: Iteration: 7 / 10 [ 70%] (Sampling)
#> Chain 1: Iteration: 8 / 10 [ 80%] (Sampling)
#> Chain 1: Iteration: 9 / 10 [ 90%] (Sampling)
#> Chain 1: Iteration: 10 / 10 [100%] (Sampling)
#> Chain 1:
#> Chain 1: Elapsed Time: 0.052941 seconds (Warm-up)
#> Chain 1: 0.032564 seconds (Sampling)
#> Chain 1: 0.085505 seconds (Total)
#> Chain 1:
#> Warning: There were 5 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems
head(summary(stan_obj)$summary)
#> mean se_mean sd 2.5% 25% 50% 75% 97.5%
#> log_mu[1] 3.098420 NaN 0 3.098420 3.098420 3.098420 3.098420 3.098420
#> log_mu[2] 1.058488 NaN 0 1.058488 1.058488 1.058488 1.058488 1.058488
#> log_mu[3] 2.107672 NaN 0 2.107672 2.107672 2.107672 2.107672 2.107672
#> log_mu[4] 2.245089 NaN 0 2.245089 2.245089 2.245089 2.245089 2.245089
#> log_mu[5] 2.002693 NaN 0 2.002693 2.002693 2.002693 2.002693 2.002693
#> log_mu[6] 1.457520 NaN 0 1.457520 1.457520 1.457520 1.457520 1.457520
#> n_eff Rhat
#> log_mu[1] NaN NaN
#> log_mu[2] NaN NaN
#> log_mu[3] NaN NaN
#> log_mu[4] NaN NaN
#> log_mu[5] NaN NaN
#> log_mu[6] NaN NaN
We can then convert this object to a BASiCS_Chain
and
carry on a workflow as before:
Stan2BASiCS(stan_obj)
#> An object of class BASiCS_Chain
#> 5 MCMC samples.
#> Dataset contains 100 biological genes and 100 cells (2 batches).
#> Object stored using BASiCS version: 2.10.0
#> Parameters: mu delta s nu theta epsilon phi beta
sessionInfo()
#> R version 4.2.1 Patched (2022-07-09 r82577)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur ... 10.16
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_GB/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] BASiCStan_1.0.0 rstan_2.21.7
#> [3] ggplot2_3.3.6 StanHeaders_2.21.0-7
#> [5] BASiCS_2.10.0 SingleCellExperiment_1.20.0
#> [7] SummarizedExperiment_1.28.0 Biobase_2.58.0
#> [9] GenomicRanges_1.50.0 GenomeInfoDb_1.34.0
#> [11] IRanges_2.32.0 S4Vectors_0.36.0
#> [13] BiocGenerics_0.44.0 MatrixGenerics_1.10.0
#> [15] matrixStats_0.62.0
#>
#> loaded via a namespace (and not attached):
#> [1] colorspace_2.0-3 ellipsis_0.3.2
#> [3] scuttle_1.8.0 bluster_1.8.0
#> [5] XVector_0.38.0 BiocNeighbors_1.16.0
#> [7] farver_2.1.1 hexbin_1.28.2
#> [9] fansi_1.0.3 codetools_0.2-18
#> [11] splines_4.2.1 sparseMatrixStats_1.10.0
#> [13] cachem_1.0.6 knitr_1.40
#> [15] jsonlite_1.8.3 cluster_2.1.4
#> [17] shiny_1.7.3 compiler_4.2.1
#> [19] dqrng_0.3.0 assertthat_0.2.1
#> [21] Matrix_1.5-1 fastmap_1.1.0
#> [23] limma_3.54.0 cli_3.4.1
#> [25] later_1.3.0 BiocSingular_1.14.0
#> [27] htmltools_0.5.3 prettyunits_1.1.1
#> [29] tools_4.2.1 rsvd_1.0.5
#> [31] igraph_1.3.5 coda_0.19-4
#> [33] gtable_0.3.1 glue_1.6.2
#> [35] GenomeInfoDbData_1.2.9 dplyr_1.0.10
#> [37] Rcpp_1.0.9 jquerylib_0.1.4
#> [39] vctrs_0.5.0 DelayedMatrixStats_1.20.0
#> [41] xfun_0.34 stringr_1.4.1
#> [43] ps_1.7.2 beachmat_2.14.0
#> [45] mime_0.12 miniUI_0.1.1.1
#> [47] lifecycle_1.0.3 irlba_2.3.5.1
#> [49] statmod_1.4.37 edgeR_3.40.0
#> [51] zlibbioc_1.44.0 MASS_7.3-58.1
#> [53] scales_1.2.1 promises_1.2.0.1
#> [55] parallel_4.2.1 inline_0.3.19
#> [57] yaml_2.3.6 gridExtra_2.3
#> [59] loo_2.5.1 sass_0.4.2
#> [61] ggExtra_0.10.0 stringi_1.7.8
#> [63] highr_0.9 ScaledMatrix_1.6.0
#> [65] scran_1.26.0 pkgbuild_1.3.1
#> [67] BiocParallel_1.32.0 rlang_1.0.6
#> [69] pkgconfig_2.0.3 bitops_1.0-7
#> [71] evaluate_0.17 lattice_0.20-45
#> [73] glmGamPoi_1.10.0 rstantools_2.2.0
#> [75] labeling_0.4.2 cowplot_1.1.1
#> [77] tidyselect_1.2.0 processx_3.8.0
#> [79] magrittr_2.0.3 R6_2.5.1
#> [81] generics_0.1.3 metapod_1.6.0
#> [83] DelayedArray_0.24.0 DBI_1.1.3
#> [85] pillar_1.8.1 withr_2.5.0
#> [87] RCurl_1.98-1.9 tibble_3.1.8
#> [89] crayon_1.5.2 utf8_1.2.2
#> [91] rmarkdown_2.17 viridis_0.6.2
#> [93] locfit_1.5-9.6 grid_4.2.1
#> [95] callr_3.7.2 digest_0.6.30
#> [97] xtable_1.8-4 httpuv_1.6.6
#> [99] RcppParallel_5.1.5 munsell_0.5.0
#> [101] viridisLite_0.4.1 bslib_0.4.0