library(cellmig)
library(ggplot2)
library(ggforce)
library(rstan)
ggplot2::theme_set(new = theme_bw(base_size = 10))High-throughput time-lapse microscopy allows us to track thousands of cells across many wells and plates. However, these experiments generate complex data with multiple sources of variability:
Biological variability: Cells on different experimental plates (biological replicates) behave differently even under the same conditions.
Technical variability: Differences between wells on the same plate.
Batch effects: Systematic differences between plates (e.g., imaging days, reagent batches).
Standard statistical tests often ignore this hierarchical structure (cells \(\rightarrow\) wells \(\rightarrow\) plates), which can lead to false positives or missed discoveries.
The Bioconductor package cellmig
addresses this by using Bayesian hierarchical models.
It separates biological signals from technical noise and provides
uncertainty-aware estimates (credible intervals) for treatment effects.
This vignette guides you through installing cellmig,
formatting your data, fitting a model, and interpreting the results.
To install this package, start R (version “4.5”) and enter:
| Column | Description | Example |
|---|---|---|
well |
Unique well ID | “w1”, “w2”, … |
plate |
Unique plate ID (Biological Replicate) | “p1”, “p2”, … |
compound |
Name of the compound/drug | “DMSO”, “DrugA”, … |
dose |
Concentration or level | “0”, “10”, “high”, … |
v |
Observed migration velocity (numeric) | 0.54 (µm/min) |
offset |
Batch correction flag (0 or 1) | 0 or 1 |
offset ColumnTo correct for plate-to-plate batch effects, you must identify a control treatment that appears on every plate (e.g., DMSO).
Set offset = 1 for cells in this control
group.
Set offset = 0 for all other treatments.
Note: The model uses this group to normalize velocities across plates
We will use the simulated dataset included in the package. It contains:
3 Plates (Biological replicates)
378 Wells (Technical replicates nested in plates)
6 Compounds at 7 Doses (42 Treatment Groups)
FALSE 'data.frame': 7560 obs. of 6 variables:
FALSE $ well : chr "1" "1" "1" "1" ...
FALSE $ plate : chr "1" "1" "1" "1" ...
FALSE $ compound: chr "C1" "C1" "C1" "C1" ...
FALSE $ dose : chr "D1" "D1" "D1" "D1" ...
FALSE $ v : num 21.905 0.535 3.348 5.351 1.194 ...
FALSE $ offset : num 1 1 1 1 1 1 1 1 1 1 ...
FALSE well plate compound dose v offset
FALSE 1 1 1 C1 D1 21.9054754 1
FALSE 2 1 1 C1 D1 0.5351645 1
FALSE 3 1 1 C1 D1 3.3484316 1
FALSE 4 1 1 C1 D1 5.3511913 1
FALSE 5 1 1 C1 D1 1.1942105 1
FALSE 6 1 1 C1 D1 26.7272783 1
Before modeling, it is good practice to visualize the raw data to check for obvious batch effects, outliers, multimodal distributions, or other imaging- related effects.
Each dot represents a single cell. Facets show different compounds. Colors indicate plates. If plate colors cluster separately within facets, you likely have batch effects.
ggplot(data = d)+
facet_wrap(facets = ~paste0("compound=", compound),
scales = "free_y", ncol = 2)+
geom_sina(aes(x = as.factor(dose), col = plate, y = v, group = well),
size = 0.5)+
theme_bw()+
theme(legend.position = "top",
strip.text.x = element_text(margin = margin(0.03,0,0.03,0, "cm")))+
ylab(label = "migration velocity")+
xlab(label = '')+
scale_color_grey()+
guides(color = guide_legend(override.aes = list(size = 3)))+
guides(shape = guide_legend(override.aes = list(size = 3)))+
scale_y_log10()+
annotation_logticks(base = 10, sides = "l")Aggregating to the well level often makes batch effects clearer. Here, we see the mean velocity per well.
dm <- aggregate(v~well+plate+compound+dose, data = d, FUN = mean)
ggplot(data = dm)+
facet_wrap(facets = ~paste0("compound=", compound),
scales = "free_y", ncol = 2)+
geom_sina(aes(x = as.factor(dose), col = plate, y = v, group = well),
size = 1.5, alpha = 0.7)+
theme_bw()+
theme(legend.position = "top",
strip.text.x = element_text(margin = margin(0.03,0,0.03,0, "cm")))+
ylab(label = "migration velocity")+
xlab(label = '')+
scale_color_grey()+
guides(color = guide_legend(override.aes = list(size = 3)))+
guides(shape = guide_legend(override.aes = list(size = 3)))+
scale_y_log10()+
annotation_logticks(base = 10, sides = "l")Now we fit the hierarchical Bayesian model. The
cellmig() function handles the complex Stan modeling behind
the scenes.
You can adjust the MCMC (Markov Chain Monte Carlo) settings via the
control list.
mcmc_warmup: Steps to tune the sampler.
mcmc_steps: Actual sampling steps used for
inference.
mcmc_chains: Number of independent chains (recommend
\(\ge\) 2 for convergence
checks).
Note: For this vignette, we use fewer steps for speed. For real
analysis, increase mcmc_steps to 2000+.
The model output o contains posterior distributions for
all parameters. We focus on the overall treatment
effects (\(\delta_t\)), which
represent the change in velocity relative to the control (offset).
The delta_t data frame contains the mean, median, and
95% Highest Density Intervals (HDI) for each treatment.
| group_id | mean | se_mean | sd | X2.5. | X25. | X50. | X75. | X97.5. | n_eff | Rhat | pmax | group | compound | dose | plate_id | plate |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | -0.84 | 0 | 0.07 | -0.98 | -0.89 | -0.84 | -0.79 | -0.71 | 1241.63 | 1 | 1.00 | C2|D1 | C2 | D1 | 1 | 1 |
| 2 | -0.44 | 0 | 0.07 | -0.57 | -0.48 | -0.44 | -0.39 | -0.31 | 1598.38 | 1 | 1.00 | C2|D2 | C2 | D2 | 1 | 1 |
| 3 | -0.20 | 0 | 0.07 | -0.33 | -0.24 | -0.20 | -0.15 | -0.07 | 1685.94 | 1 | 0.99 | C2|D3 | C2 | D3 | 1 | 1 |
| 4 | 0.06 | 0 | 0.07 | -0.07 | 0.02 | 0.06 | 0.11 | 0.20 | 1371.14 | 1 | 0.67 | C2|D4 | C2 | D4 | 1 | 1 |
| 5 | 0.12 | 0 | 0.07 | -0.01 | 0.07 | 0.12 | 0.17 | 0.25 | 1455.04 | 1 | 0.92 | C2|D5 | C2 | D5 | 1 | 1 |
| 6 | 0.44 | 0 | 0.07 | 0.30 | 0.40 | 0.44 | 0.49 | 0.58 | 1351.50 | 1 | 1.00 | C2|D6 | C2 | D6 | 1 | 1 |
| 7 | 0.82 | 0 | 0.07 | 0.69 | 0.77 | 0.82 | 0.86 | 0.94 | 1331.42 | 1 | 1.00 | C2|D7 | C2 | D7 | 1 | 1 |
| 8 | -0.48 | 0 | 0.07 | -0.60 | -0.52 | -0.48 | -0.43 | -0.34 | 1442.17 | 1 | 1.00 | C3|D1 | C3 | D1 | 1 | 1 |
| 9 | -0.22 | 0 | 0.07 | -0.34 | -0.27 | -0.22 | -0.17 | -0.08 | 1381.08 | 1 | 1.00 | C3|D2 | C3 | D2 | 1 | 1 |
| 10 | -0.01 | 0 | 0.07 | -0.14 | -0.06 | -0.02 | 0.03 | 0.12 | 1539.08 | 1 | 0.18 | C3|D3 | C3 | D3 | 1 | 1 |
| 11 | 0.01 | 0 | 0.07 | -0.13 | -0.04 | 0.01 | 0.06 | 0.15 | 1773.13 | 1 | 0.10 | C3|D4 | C3 | D4 | 1 | 1 |
| 12 | 0.21 | 0 | 0.07 | 0.08 | 0.16 | 0.20 | 0.25 | 0.35 | 1155.68 | 1 | 1.00 | C3|D5 | C3 | D5 | 1 | 1 |
| 13 | 0.14 | 0 | 0.07 | 0.01 | 0.09 | 0.14 | 0.18 | 0.27 | 1485.52 | 1 | 0.97 | C3|D6 | C3 | D6 | 1 | 1 |
| 14 | 0.23 | 0 | 0.07 | 0.10 | 0.19 | 0.23 | 0.27 | 0.38 | 1116.98 | 1 | 1.00 | C3|D7 | C3 | D7 | 1 | 1 |
| 15 | -0.14 | 0 | 0.07 | -0.27 | -0.18 | -0.14 | -0.09 | 0.01 | 1850.31 | 1 | 0.93 | C4|D1 | C4 | D1 | 1 | 1 |
| 16 | 0.01 | 0 | 0.07 | -0.12 | -0.04 | 0.01 | 0.05 | 0.15 | 1282.55 | 1 | 0.08 | C4|D2 | C4 | D2 | 1 | 1 |
| 17 | -0.18 | 0 | 0.07 | -0.32 | -0.22 | -0.18 | -0.14 | -0.05 | 1788.89 | 1 | 0.99 | C4|D3 | C4 | D3 | 1 | 1 |
| 18 | 0.01 | 0 | 0.07 | -0.12 | -0.03 | 0.01 | 0.06 | 0.15 | 1925.42 | 1 | 0.13 | C4|D4 | C4 | D4 | 1 | 1 |
| 19 | 0.11 | 0 | 0.07 | -0.02 | 0.06 | 0.11 | 0.15 | 0.24 | 1523.07 | 1 | 0.90 | C4|D5 | C4 | D5 | 1 | 1 |
| 20 | 0.50 | 0 | 0.07 | 0.37 | 0.45 | 0.50 | 0.54 | 0.63 | 1722.87 | 1 | 1.00 | C4|D6 | C4 | D6 | 1 | 1 |
| 21 | 0.52 | 0 | 0.07 | 0.38 | 0.47 | 0.52 | 0.57 | 0.67 | 1581.37 | 1 | 1.00 | C4|D7 | C4 | D7 | 1 | 1 |
| 22 | 0.38 | 0 | 0.07 | 0.24 | 0.34 | 0.38 | 0.44 | 0.52 | 1364.32 | 1 | 1.00 | C5|D1 | C5 | D1 | 1 | 1 |
| 23 | 0.19 | 0 | 0.07 | 0.06 | 0.14 | 0.19 | 0.24 | 0.32 | 1315.31 | 1 | 0.99 | C5|D2 | C5 | D2 | 1 | 1 |
| 24 | 0.13 | 0 | 0.07 | 0.00 | 0.08 | 0.13 | 0.18 | 0.26 | 1394.59 | 1 | 0.95 | C5|D3 | C5 | D3 | 1 | 1 |
| 25 | -0.01 | 0 | 0.07 | -0.14 | -0.05 | -0.01 | 0.04 | 0.12 | 1196.87 | 1 | 0.07 | C5|D4 | C5 | D4 | 1 | 1 |
| 26 | -0.04 | 0 | 0.07 | -0.17 | -0.08 | -0.04 | 0.00 | 0.09 | 1390.06 | 1 | 0.48 | C5|D5 | C5 | D5 | 1 | 1 |
| 27 | -0.32 | 0 | 0.07 | -0.45 | -0.37 | -0.32 | -0.27 | -0.18 | 1472.63 | 1 | 1.00 | C5|D6 | C5 | D6 | 1 | 1 |
| 28 | -0.37 | 0 | 0.07 | -0.50 | -0.42 | -0.37 | -0.32 | -0.23 | 1687.21 | 1 | 1.00 | C5|D7 | C5 | D7 | 1 | 1 |
| 29 | 0.56 | 0 | 0.07 | 0.43 | 0.52 | 0.56 | 0.61 | 0.69 | 1327.79 | 1 | 1.00 | C6|D1 | C6 | D1 | 1 | 1 |
| 30 | 0.28 | 0 | 0.06 | 0.16 | 0.24 | 0.28 | 0.33 | 0.41 | 1091.96 | 1 | 1.00 | C6|D2 | C6 | D2 | 1 | 1 |
| 31 | 0.00 | 0 | 0.06 | -0.12 | -0.04 | 0.00 | 0.05 | 0.13 | 1162.02 | 1 | 0.04 | C6|D3 | C6 | D3 | 1 | 1 |
| 32 | 0.00 | 0 | 0.07 | -0.13 | -0.05 | 0.00 | 0.04 | 0.13 | 1491.55 | 1 | 0.02 | C6|D4 | C6 | D4 | 1 | 1 |
| 33 | -0.15 | 0 | 0.07 | -0.30 | -0.20 | -0.15 | -0.11 | -0.02 | 1258.92 | 1 | 0.97 | C6|D5 | C6 | D5 | 1 | 1 |
| 34 | -0.30 | 0 | 0.07 | -0.44 | -0.35 | -0.30 | -0.26 | -0.17 | 1956.54 | 1 | 1.00 | C6|D6 | C6 | D6 | 1 | 1 |
| 35 | -0.68 | 0 | 0.07 | -0.82 | -0.73 | -0.68 | -0.63 | -0.54 | 1309.73 | 1 | 1.00 | C6|D7 | C6 | D7 | 1 | 1 |
ggplot(data = o$posteriors$delta_t) +
geom_line(aes(x = dose, y = mean, col = compound, group = compound)) +
geom_point(aes(x = dose, y = mean, col = compound)) +
geom_errorbar(aes(x = dose, y = mean, ymin = X2.5., ymax = X97.5.,
col = compound), width = 0.1) +
ylab(label = expression("Log-Fold Change ("*delta*")")) +
xlab("Dose") +
theme(legend.position = "top")Log-scale values can be hard to interpret biologically. We often prefer Fold-Change, where 1 = no effect, >1 = increase, <1 = decrease. We calculate this by exponentiating the values (\(\exp(\delta)\)).
ggplot(data = o$posteriors$delta_t) +
geom_line(aes(x = dose, y = exp(mean), col = compound, group = compound)) +
geom_point(aes(x = dose, y = exp(mean), col = compound)) +
geom_errorbar(aes(x = dose, y = exp(mean), ymin = exp(X2.5.),
ymax = exp(X97.5.), col = compound), width = 0.1) +
ylab(label = expression("Fold Change ("*delta*"')")) +
xlab("Dose") +
theme(legend.position = "top")Often, you want to compare specific treatments against each other (e.g., Drug A vs. Drug B), not just against the control.
The get_pairs() function computes the difference between
all treatment groups.
To inspect the distribution of differences between a specific target
group and all others, use get_violins().
FALSE group_id group compound dose
FALSE 1 1 C2|D1 C2 D1
FALSE 2 2 C2|D2 C2 D2
FALSE 3 3 C2|D3 C2 D3
FALSE 4 4 C2|D4 C2 D4
FALSE 5 5 C2|D5 C2 D5
FALSE 6 6 C2|D6 C2 D6
# Compare all groups against Compound 2, Dose 1
u_violin <- get_violins(x = o,
from_groups = groups$group,
to_group = "C2|D1",
exponentiate = FALSE)
u_violin$plotReliable uncertainty quantification requires verifying that the model
fits the data well and that probability metrics (\(\pi\)) are calibrated to your experimental
design. cellmig provides tools for both.
PPCs compare observed data to data simulated from the fitted model. If the model is well-specified, the simulated data (pink) should overlap the observed data (black).
It is crucial to verify that the model fits your data well.
cellmig provides Posterior Predictive Checks (PPC).
Interpretation:
Good Fit: Observed points fall within the pink violin densities; well-means cluster near the diagonal.
Bad Fit: Systematic deviations suggest the model may not be valid and the outputs (parameter values) should not be trusted!
Do the simulated velocities (pink) match the observed velocities (black)? If they overlap well, the model captures the data distribution.
To detect influential observations or model misspecification, cellmig supports Pareto k diagnostics via the rstan or loo packages: https://mc-stan.org/loo/reference/pareto-k-diagnostic.html).
High k values (>0.7) indicate highly influential observations that the model struggles to predict, e.g. outlying cell velocities generated by tracking or segmentation artifacts or experimental errors.
FALSE
FALSE Computed from 1400 by 7560 log-likelihood matrix.
FALSE
FALSE Estimate SE
FALSE elpd_loo -22300.4 106.7
FALSE p_loo 69.1 3.6
FALSE looic 44600.9 213.4
FALSE ------
FALSE MCSE of elpd_loo is 0.2.
FALSE MCSE and ESS estimates assume MCMC draws (r_eff in [0.5, 2.7]).
FALSE
FALSE All Pareto k estimates are good (k < 0.68).
FALSE See help('pareto-k-diagnostic') for details.
FALSE integer(0)
Bayesian uncertainty estimates (HDI, \(\pi\)) are only valid if the Markov Chain
Monte Carlo (MCMC) sampler has successfully explored the posterior
distribution. cellmig provides tools to verify
convergence.
The potential scale reduction factor (\(\hat{R}\)) should be close to 1.0 (typically \(< 1.01\)) for all parameters. Additionally, the Effective Sample Size (ESS) should be sufficiently large to ensure stable HDI estimates.
Divergent transitions indicate that the sampler encountered regions of high curvature it could not explore. Valid uncertainty quantification requires 0 divergent transitions.
FALSE
FALSE Divergences:
FALSE
FALSE Tree depth:
FALSE
FALSE Energy:
Is any warnings are produced by this check, please consult the guidelines provided in the following vignette:
You can inspect how much variability comes from different sources (plates, wells, treatments). This helps in planning future experiments.
# Plate-specific baseline effects
g_alpha_p <- ggplot(data = o$posteriors$alpha_p) +
geom_errorbarh(aes(y = plate, x = mean, xmin = X2.5., xmax = X97.5.),
height = 0.2) +
geom_point(aes(y = plate, x = mean)) +
xlab("Plate Effect (log-scale)")
# Variance parameters (Biological vs Technical)
g_sigma <- ggplot() +
geom_errorbarh(data = o$posteriors$sigma_bio,
aes(y = "Biological (Plate)",
x = mean, xmin = X2.5., xmax = X97.5.), height = 0.2) +
geom_errorbarh(data = o$posteriors$sigma_tech,
aes(y = "Technical (Well)",
x = mean, xmin = X2.5., xmax = X97.5.), height = 0.2) +
geom_errorbarh(data = o$posteriors$sigma_delta,
aes(y = "Treatment Variation",
x = mean, xmin = X2.5., xmax = X97.5.), height = 0.2) +
geom_point(data = o$posteriors$sigma_bio,
aes(y = "Biological (Plate)", x = mean)) +
geom_point(data = o$posteriors$sigma_tech,
aes(y = "Technical (Well)", x = mean)) +
geom_point(data = o$posteriors$sigma_delta,
aes(y = "Treatment Variation", x = mean)) +
xlab("Standard Deviation")
g_alpha_p | g_sigmaFor screens with multiple compounds and overlapping doses,
cellmig can cluster compounds based on their dose-response
similarity, accounting for uncertainty.
get_dose_response_profile(x = o, exponentiate = TRUE) +
patchwork::plot_layout(widths = c(.7, 1, 2))FALSE R version 4.6.0 (2026-04-24)
FALSE Platform: x86_64-pc-linux-gnu
FALSE Running under: Ubuntu 24.04.4 LTS
FALSE
FALSE Matrix products: default
FALSE BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
FALSE LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
FALSE
FALSE locale:
FALSE [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
FALSE [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
FALSE [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
FALSE [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
FALSE [9] LC_ADDRESS=C LC_TELEPHONE=C
FALSE [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
FALSE
FALSE time zone: Etc/UTC
FALSE tzcode source: system (glibc)
FALSE
FALSE attached base packages:
FALSE [1] stats graphics grDevices utils datasets methods base
FALSE
FALSE other attached packages:
FALSE [1] rstan_2.32.7 StanHeaders_2.32.10 ggforce_0.5.0
FALSE [4] ggplot2_4.0.3 cellmig_1.3.4 BiocStyle_2.41.0
FALSE
FALSE loaded via a namespace (and not attached):
FALSE [1] ggiraph_0.9.6 tidyselect_1.2.1 dplyr_1.2.1
FALSE [4] farver_2.1.2 loo_2.9.0 S7_0.2.2
FALSE [7] fastmap_1.2.0 lazyeval_0.2.3 tensorA_0.36.2.1
FALSE [10] tweenr_2.0.3 fontquiver_0.2.1 digest_0.6.39
FALSE [13] lifecycle_1.0.5 tidytree_0.4.7 posterior_1.7.0
FALSE [16] magrittr_2.0.5 compiler_4.6.0 rlang_1.2.0
FALSE [19] sass_0.4.10 tools_4.6.0 yaml_2.3.12
FALSE [22] knitr_1.51 labeling_0.4.3 htmlwidgets_1.6.4
FALSE [25] pkgbuild_1.4.8 curl_7.1.0 plyr_1.8.9
FALSE [28] RColorBrewer_1.1-3 abind_1.4-8 aplot_0.2.9
FALSE [31] withr_3.0.2 purrr_1.2.2 sys_3.4.3
FALSE [34] grid_4.6.0 polyclip_1.10-7 stats4_4.6.0
FALSE [37] gdtools_0.5.0 inline_0.3.21 scales_1.4.0
FALSE [40] MASS_7.3-65 cli_3.6.6 rmarkdown_2.31
FALSE [43] treeio_1.37.0 generics_0.1.4 RcppParallel_5.1.11-2
FALSE [46] ggtree_4.3.0 reshape2_1.4.5 ape_5.8-1
FALSE [49] cachem_1.1.0 stringr_1.6.0 parallel_4.6.0
FALSE [52] ggplotify_0.1.3 BiocManager_1.30.27 matrixStats_1.5.0
FALSE [55] vctrs_0.7.3 yulab.utils_0.2.4 V8_8.2.0
FALSE [58] jsonlite_2.0.0 fontBitstreamVera_0.1.1 gridGraphics_0.5-1
FALSE [61] patchwork_1.3.2 systemfonts_1.3.2 maketools_1.3.2
FALSE [64] tidyr_1.3.2 jquerylib_0.1.4 glue_1.8.1
FALSE [67] codetools_0.2-20 distributional_0.7.0 stringi_1.8.7
FALSE [70] gtable_0.3.6 QuickJSR_1.10.0 tibble_3.3.1
FALSE [73] pillar_1.11.1 rappdirs_0.3.4 htmltools_0.5.9
FALSE [76] R6_2.6.1 evaluate_1.0.5 lattice_0.22-9
FALSE [79] backports_1.5.1 ggfun_0.2.0 fontLiberation_0.1.0
FALSE [82] bslib_0.11.0 rstantools_2.6.0 Rcpp_1.1.1-1.1
FALSE [85] checkmate_2.3.4 gridExtra_2.3 nlme_3.1-169
FALSE [88] xfun_0.57 fs_2.1.0 buildtools_1.0.0
FALSE [91] pkgconfig_2.0.3