Help for package eam

Type:

Package

Title:

Evidence Accumulation Models

Version:

1.0.1

LinkingTo:

Rcpp

Imports:

Rcpp, dplyr, tidyr, arrow, rlang, distributional, stats, parallel, codetools, grDevices, graphics, ggplot2, gridExtra, data.table, purrr, scales

Suggests:

testthat (≥ 3.0.0), pbapply, abc

Description:

Simulation-based evidence accumulation models for analyzing responses and reaction times in single- and multi-response tasks. The package includes simulation engines for five representative models: the Diffusion Decision Model (DDM), Leaky Competing Accumulator (LCA), Linear Ballistic Accumulator (LBA), Racing Diffusion Model (RDM), and Levy Flight Model (LFM), and extends these frameworks to multi-response settings. The package supports user-defined functions for item-level parameterization and the incorporation of covariates, enabling flexible customization and the development of new model variants based on existing architectures. Inference is performed using simulation-based methods, including Approximate Bayesian Computation (ABC) and Amortized Bayesian Inference (ABI), which allow parameter estimation without requiring tractable likelihood functions. In addition to core inference tools, the package provides modules for parameter recovery, posterior predictive checks, and model comparison, facilitating the study of a wide range of cognitive processes in tasks involving perceptual decision making, memory retrieval, and value-based decision making. Key methods implemented in the package are described in Ratcliff (1978) <doi:10.1037/0033-295X.85.2.59>, Usher and McClelland (2001) <doi:10.1037/0033-295X.108.3.550>, Brown and Heathcote (2008) <doi:10.1016/j.cogpsych.2007.12.002>, Tillman, Van Zandt and Logan (2020) <doi:10.3758/s13423-020-01719-6>, Wieschen, Voss and Radev (2020) <doi:10.20982/tqmp.16.2.p120>, Csilléry, François and Blum (2012) <doi:10.1111/j.2041-210X.2011.00179.x>, Beaumont (2019) <doi:10.1146/annurev-statistics-030718-105212>, and Sainsbury-Dale, Zammit-Mangion and Huser (2024) <doi:10.1080/00031305.2023.2249522>.

License:

MIT + file LICENSE

Encoding:

UTF-8

Config/testthat/edition:

RoxygenNote:

7.3.3

Depends:

R (≥ 4.1.0)

URL:

https://github.com/y-guang/eam

NeedsCompilation:

yes

Packaged:

2026-01-12 22:46:13 UTC; spike

Author:

Guangyu Zhu

[aut], Guang Yang

[aut, cre]

Maintainer:

Guang Yang <guang.spike.yang@gmail.com>

Repository:

CRAN

Date/Publication:

2026-01-17 20:10:02 UTC

Add two summarise_by specs together

Description

S3 method for the + operator to combine two 'eam_summarise_by_spec' objects into a single spec that will apply both operations.

Usage

## S3 method for class 'eam_summarise_by_spec'
e1 + e2

Arguments

e1

First eam_summarise_by_spec or eam_summarise_by_tbl object

e2

Second eam_summarise_by_spec or eam_summarise_by_tbl object

Value

A combined eam_summarise_by_spec object

Join two eam_summarise_by_tbl objects

Description

S3 method for the + operator to join two summary tables created by summarise_by. Tables must have identical .wider_by attributes to be joined.

Usage

## S3 method for class 'eam_summarise_by_tbl'
e1 + e2

Arguments

e1

First eam_summarise_by_tbl object

e2

Second eam_summarise_by_tbl object

Value

A joined data frame with class "eam_summarise_by_tbl", preserving the .wider_by attribute from the input tables

Bootstrap resample ABC posterior samples

Description

Bootstrap resample ABC posterior samples

Usage

abc_posterior_bootstrap(abc_result, n_samples, replace = TRUE)

Arguments

abc_result

An abc object from abc

n_samples

Number of bootstrap samples to draw (default 1000)

replace

Logical, whether to sample with replacement (default TRUE)

Value

Data frame of bootstrapped parameter values

Examples

# Load an example abc output, you should generate it by applying ABC to your data
# check abc::abc for details on fitting ABC models
rdm_minimal_example <- system.file("extdata", "rdm_minimal", package = "eam")
abc_model <- readRDS(file.path(rdm_minimal_example, "abc", "abc_neuralnet_model.rds"))

# Bootstrap resample posterior parameters
posterior_params <- abc_posterior_bootstrap(
  abc_model,
  n_samples = 100
)

# View the first few rows of the bootstrapped posterior parameters
head(posterior_params)

ABC model comparison wrapper

Description

Wrapper function for postpr to facilitate model comparison. This function simplifies the process of comparing multiple models using ABC by automatically stacking summary statistics and creating model indices.

Usage

abc_postpr(sumstats = list(), target, ...)

Arguments

sumstats

A named list of summary statistics matrices from different models. Each element should be a matrix or data frame with the same columns.

target

Target summary statistics from observed data (vector or matrix)

...

Additional arguments passed to postpr

Value

An object of class "postpr" from postpr

Examples

# Load pre-computed ABC input for model comparison
# This example compares the same model to itself for demonstration
rdm_minimal_example <- system.file("extdata", "rdm_minimal", package = "eam")
abc_input <- readRDS(file.path(rdm_minimal_example, "abc", "abc_input.rds"))

# Compare two models using their summary statistics
# In practice, create different abc_input objects for different models:
# abc_input_1 <- build_abc_input(..., simulation_summary = sim_summary_1, ...)
# abc_input_2 <- build_abc_input(..., simulation_summary = sim_summary_2, ...)
postpr_result <- abc_postpr(
  sumstats = list(model1 = abc_input$sumstat, model2 = abc_input$sumstat),
  target = abc_input$target,
  tol = 0.5,
  method = "rejection"
)

# View model comparison results
summary(postpr_result)

ABC with resampling

Description

Performs ABC inference with resampling to assess stability and uncertainty. Each iteration draws a random sample from the simulation pool and runs ABC, producing multiple posterior estimates for comparison.

Usage

abc_resample(
  target,
  param,
  sumstat,
  n_iterations,
  n_samples,
  replace = FALSE,
  ...
)

Arguments

target

Target summary statistics from observed data

param

Parameter values matrix or data frame

sumstat

Summary statistics matrix or data frame

n_iterations

Number of resample iterations

n_samples

Number of samples to draw in each iteration

replace

Logical, whether to sample with replacement (default FALSE)

...

Additional arguments passed to abc::abc

Value

A list of length n_iterations, where each element is an object of class abc returned by abc. Each list element contains the ABC posterior for one bootstrap iteration, allowing assessment of stability and uncertainty in parameter estimates.

Examples

# Load ABC input data from example simulation
abc_input <- readRDS(
  system.file("extdata", "rdm_minimal", "abc", "abc_input.rds", package = "eam")
)

# Perform ABC resampling
results <- abc_resample(
  target = abc_input$target,
  param = abc_input$param,
  sumstat = abc_input$sumstat,
  n_iterations = 2,
  n_samples = 2,
  tol = 0.5,
  method = "rejection"
)

# check the abc results
str(results)

Simulate evidence accumulation in a drift-diffusion model

Description

Simulate evidence accumulation in a drift-diffusion model

Usage

accumulate_evidence_ddm(
  A,
  V,
  Z,
  ndt,
  max_t,
  dt,
  max_reached,
  noise_mechanism = "add",
  noise_func = NULL
)

Simulate evidence accumulation in a two-bound drift-diffusion model

Description

Simulate evidence accumulation in a two-bound drift-diffusion model

Usage

accumulate_evidence_ddm_2b(
  A_upper,
  A_lower,
  V,
  Z,
  ndt,
  max_t,
  dt,
  max_reached,
  noise_mechanism = "add",
  noise_func = NULL
)

Simulate evidence accumulation in a leaky competing accumulator model with Global Inhibition (LCA-GI)

Description

Simulate evidence accumulation in a leaky competing accumulator model with Global Inhibition (LCA-GI)

Usage

accumulate_evidence_lca_gi(
  A,
  V,
  Z,
  ndt,
  beta,
  k,
  max_t,
  dt,
  max_reached,
  noise_func = NULL
)

Internal function to apply a spec to data

Description

Internal function to apply a spec to data

Usage

apply_summarise_by_spec(spec_list, .data)

Arguments

spec_list

A list of spec operations (the internal spec list)

.data

A data frame

Value

A data frame with class "eam_summarise_by_tbl"

Build input for Approximate Bayesian Computation (ABC)

Description

Prepares simulation output, summary statistics, and target data for ABC analysis using the abc package. Extracts parameters and summary statistics from simulation results and formats them into matrices suitable for ABC parameter estimation.

Usage

build_abc_input(simulation_output, simulation_summary, target_summary, param)

Arguments

simulation_output

A eam_simulation_output object containing that is from run_simulation or load_simulation_output.

simulation_summary

A data frame containing summary statistics for each simulated condition. Should have a 'condition_idx' column and be created by summarise_by.

target_summary

A data frame containing target summary statistics to match against simulation results. Should have the same summary statistic columns as simulation_summary (excluding 'wider_by' columns).

param

Character vector of parameter names to extract from simulation_output. These parameters will be used as the parameter space for ABC estimation.

Details

This function provides a streamlined workflow for preparing ABC inputs, but it requires that all components be constructed using this package's functions. Specifically, simulation_output must be created by run_simulation or load_simulation_output, and both simulation_summary and target_summary must be generated using summarise_by. If your data originates from external sources or custom pipelines, you should manually construct the ABC input list instead, ensuring proper matrix formatting and column alignment as expected by abc::abc.

Value

A list with components suitable for abc::abc

Required format for summary statistics

Both simulation_summary and target_summary must be created using summarise_by. This ensures consistent column naming and data structure required for ABC analysis. See summarise_by for details on generating properly formatted summaries, and map_by_condition for typical workflow examples. If you want more flexibility in summary statistic calculation, you can manually construct the ABC input list. It is not necessary to use this function if you are familiar with the abc package.

Examples


# Load the example dataset
rdm_minimal_example <- system.file("extdata", "rdm_minimal", package = "eam")
sim_output <- load_simulation_output(file.path(rdm_minimal_example, "simulation"))
obs_df <- read.csv(file.path(rdm_minimal_example, "observation", "observation_data.csv"))

# Define summary statistics pipeline
summary_pipe <- summarise_by(
  .by = c("condition_idx"),
  rt_mean = mean(rt)
)

# Calculate summary statistics for simulation and observation
sim_summary <- map_by_condition(
  sim_output,
  .progress = FALSE,
  .parallel = FALSE,
  function(cond_df) {
    summary_pipe(cond_df)
  }
)
obs_summary <- summary_pipe(obs_df)

# Build ABC input
abc_input <- build_abc_input(
  simulation_output = sim_output,
  simulation_summary = sim_summary,
  target_summary = obs_summary,
  param = c("V_beta_1", "V_beta_group")
)

# Perform ABC parameter estimation using rejection method
abc_rejection_model <- abc::abc(
  target = abc_input$target,
  param = abc_input$param,
  sumstat = abc_input$sumstat,
  tol = 0.5,
  method = "rejection"
)

Calculate total number of rows needed for flattened data

Description

This function counts the total number of items across all conditions and trials to determine the size needed for pre-allocation.

Usage

calculate_total_rows(sim_results, first_trial_col)

Arguments

sim_results

The output from run_simulation(), a list of conditions

first_trial_col

Name of the first trial column to use for counting

Value

Integer, total number of rows needed

Backend detector for standard DDM

Description

Backend detector for standard DDM

Usage

detect_backend_ddm(model_lower, config)

Arguments

model_lower

Lowercase model name

config

A list containing simulation configuration parameters

Value

Backend name if this detector handles the config, NULL otherwise

Backend detector for 2-boundary DDM

Description

Backend detector for 2-boundary DDM

Usage

detect_backend_ddm_2b(model_lower, config)

Arguments

model_lower

Lowercase model name

config

A list containing simulation configuration parameters

Value

Backend name if this detector handles the config, NULL otherwise

Backend detector for LCA-GI

Description

Backend detector for LCA-GI

Usage

detect_backend_lca_gi(model_lower, config)

Arguments

model_lower

Lowercase model name

config

A list containing simulation configuration parameters

Value

Backend name if this detector handles the config, NULL otherwise

Evaluate a list of formulas sequentially with data

Description

This function evaluates a list of formulas sequentially, allowing later formulas to reference

Usage

evaluate_with_dt(formulas, data = list(), n)

Arguments

formulas

A list of formulas to evaluate

data

A list of named values to use as the initial environment

n

The number of values to generate for each formula

Value

A named list of evaluated values with length n

Extract parameter values from abc result

Description

Extract parameter values from abc result

Usage

extract_abc_param_values(abc_result)

Arguments

abc_result

Single abc result object

Value

Matrix of parameter values

Extract posterior medians from abc_resample output

Description

Internal helper to compute parameter medians across abc_resample iterations.

Usage

extract_resample_medians(resample_results)

Arguments

resample_results

List of abc results from abc_resample

Value

Matrix where each row is an iteration and each column is parameter median

Fill pre-allocated data.table with simulation results

Description

This function fills the pre-allocated data.table vectors with data from simulation results, iterating through all conditions and trials.

Usage

fill_data_table(
  sim_results,
  dt_lists,
  trial_col_names,
  cond_param_names,
  first_trial_col
)

Arguments

sim_results

The output from run_simulation(), a list of conditions

dt_lists

Named list of pre-allocated vectors for each column

trial_col_names

Character vector of trial column names

cond_param_names

Character vector of condition parameter names

first_trial_col

Name of the first trial column to use for item counting

Value

Named list of filled vectors ready for data.table creation

Convert simulation results to a tidy data.table

Description

This function takes the nested list output from run_simulation() and converts it into a tidy data.table where each row represents one item response. The function pre-allocates the data.table to the exact size needed and then fills it efficiently. Column names are dynamically determined from the first trial result, excluding any .item_params verbose output.

Usage

flatten_simulation_results(sim_results)

Arguments

sim_results

The output from run_simulation(), a list of conditions

Value

A data.table with columns: condition_idx, trial_idx, rank_idx, all columns from trial results (excluding .item_params), and all variables from cond_params

Get all registered backend detectors

Description

Get all registered backend detectors

Usage

get_backend_detectors()

Value

A list of backend detector functions

Extract column names from simulation results

Description

This function extracts trial column names and condition parameter names from simulation results structure.

Usage

get_column_names(sim_results)

Arguments

sim_results

The output from run_simulation(), a list of conditions

Value

A list with trial_col_names and cond_param_names

Extract all left-hand side variable names from config formulas and prior_params

Description

Extract all left-hand side variable names from config formulas and prior_params

Usage

get_config_env_names(config)

Arguments

config

A list containing simulation configuration parameters

Value

A character vector of all LHS variable names from formulas and prior_params columns

Initialize simulation output directory structure

Description

Creates and validates the output directory structure for a simulation. This function ensures the directory is empty (or creates it), then creates the required subdirectories based on simulation_output_fs_proto.

Usage

init_simulation_output_dir(output_dir)

Arguments

output_dir

The base output directory path

Value

The output_dir path (invisibly for chaining)

Rebuild eam_simulation_output from an existing output directory

Description

This function reconstructs a eam_simulation_output object from a previously saved simulation output directory.

Usage

load_simulation_output(output_dir)

Arguments

output_dir

The directory containing the simulation results and config

Value

A eam_simulation_output object

Examples

# Load simulation output from package data
sim_output_path <- system.file(
  "extdata", "rdm_minimal", "simulation",
  package = "eam"
)
sim_output <- load_simulation_output(sim_output_path)

# Access the configuration
sim_output$simulation_config

# Access the dataset (check arrow documentation for working with the dataset)
dataset <- sim_output$open_dataset()

Map a function by condition across simulation output chunks

Description

This function processes simulation output by gathering all chunks, iterating through them one by one, filtering and collecting data by chunk, then applying a user-defined function by condition within each chunk.

Usage

map_by_condition(
  simulation_output,
  .f,
  ...,
  .combine = dplyr::bind_rows,
  .parallel = NULL,
  .n_cores = NULL,
  .progress = FALSE
)

Arguments

simulation_output

A eam_simulation_output object containing the dataset and configuration

.f

A function to apply to each condition's data. The function should accept a data frame representing one condition's results

...

Additional arguments passed to the function .f

.combine

Function to combine results (default: dplyr::bind_rows)

.parallel

Logical or NULL.

.n_cores

Integer. Number of CPU cores to use for parallel processing. If NULL, uses detectCores() - 1. Only used when .parallel = TRUE.

.progress

Logical, whether to show a progress bar (default: FALSE)

Details

This function handles out-of-core computation automatically using Apache Arrow, so you don't need to understand Arrow internals. It loads data chunk by chunk to avoid memory issues with large simulations.

If you prefer to manually work with the raw Arrow dataset, you can access it via simulation_output$open_dataset(), which returns an Arrow Dataset object. You can then use dplyr verbs to filter and query before calling dplyr::collect() to load data into memory.

Value

A list containing the results of applying .f to each condition, with names corresponding to condition indices

Examples

# Load simulation output
sim_output_path <- system.file(
  "extdata", "rdm_minimal", "simulation",
  package = "eam"
)
sim_output <- load_simulation_output(sim_output_path)

# Define a summary pipeline
summary_pipe <- summarise_by(
  .by = c("condition_idx"),
  rt_mean = mean(rt),
  rt_quantiles = quantile(rt, probs = c(0.1, 0.5, 0.9))
)

# Apply function to each condition
sim_sumstat <- map_by_condition(
  sim_output,
  .progress = FALSE,
  .parallel = FALSE,
  function(cond_df) {
    summary_pipe(cond_df)
  }
)

Heuristic to determine if parallel processing should be used

Description

Heuristic to determine if parallel processing should be used

Usage

map_by_condition.parallel.heuristic(chunk_indices)

Arguments

chunk_indices

Vector of chunk indices

Value

Logical value indicating whether to use parallel processing

Process a single chunk for map_by_condition

Description

Process a single chunk for map_by_condition

Usage

map_by_condition.process_chunk(open_dataset_fn, .f, ...)

Arguments

open_dataset_fn

Arrow dataset object or function that returns a dataset

.f

Function to apply to each condition's data

...

Additional arguments passed to .f

Value

Function that processes a chunk_idx

Create a new simulation configuration

Description

This function creates a new eam simulation configuration object that contains all parameters needed to run a simulation.

Usage

new_simulation_config(
  prior_params = list(),
  prior_formulas = list(),
  between_trial_formulas = list(),
  item_formulas = list(),
  n_conditions_per_chunk = NULL,
  n_conditions,
  n_trials_per_condition,
  n_items,
  max_reached = n_items,
  max_t,
  dt = 0.001,
  noise_mechanism = "add",
  noise_factory = NULL,
  model = "ddm",
  parallel = FALSE,
  n_cores = NULL,
  rand_seed = NULL
)

Arguments

prior_params

A list or data frame of initial values for prior

prior_formulas

A list of formulas defining prior distributions for condition-level parameters

between_trial_formulas

A list of formulas defining between-trial parameters

item_formulas

A list of formulas defining item-level parameters

n_conditions_per_chunk

Number of conditions to process per chunk (optional, typically does not need to be set. It determine the storage and in-memory size of each chunk, if you find memory issues, try reducing this number)

n_conditions

Total number of conditions to simulate

n_trials_per_condition

Number of trials per condition

n_items

Number of items per trial

max_reached

Maximum number of items that can be recalled (default: n_items)

max_t

Maximum simulation time

dt

Time step size (default: 0.001)

noise_mechanism

Noise mechanism ("add", "mult_evidence", or "mult_t", default: "add")

noise_factory

Function that creates noise functions.

model

Model name or backend names (e.g., "ddm", "rdm", "lca")

parallel

Whether to run in parallel (default: FALSE)

n_cores

Number of cores for parallel processing (default: NULL, auto-detect)

rand_seed

Random seed for parallel processing (default: NULL)

Details

This function only creates the configuration object and does not run the simulation. To actually execute the simulation, you must pass the returned configuration object to run_simulation.

Supported Models:

This package supports three evidence accumulation models. The appropriate backend is automatically selected based on the model parameter and the parameters defined in your formulas.

DDM (Drift Diffusion Model)

Models evidence accumulation towards a single upper threshold. Items either reach the threshold and are recalled, or time out.

Required parameters (must appear in prior_formulas, between_trial_formulas, or item_formulas):

A - Upper decision boundary/threshold
V - Drift rate (evidence accumulation rate)
Z - Starting point of evidence
ndt - Non-decision time

Set model = "ddm"

RDM (Racing Diffusion Model)

Models multiple racing evidence accumulators, each with upper and lower thresholds for binary decisions (correct/incorrect).

Required parameters:

A_upper - Upper decision boundary (correct response)
A_lower - Lower decision boundary (incorrect response)
V - Drift rate
Z - Starting point of evidence
ndt - Non-decision time

Set model = "rdm". Note: If you set model = "ddm" but define A_upper instead of A, the model will automatically switch to RDM.

LCA (Leaky Competing Accumulator)

Models competitive evidence accumulation with leakage and mutual inhibition between accumulators.

Required parameters:

A - Decision threshold
V - Input strength/drift rate
Z - Starting point of evidence
ndt - Non-decision time
beta - Self-excitation/leak parameter
k - Lateral inhibition strength

Set model = "lca"

LFM (Lévy Flight Model)

Uses the same parameters as DDM. See DDM above.

Set model = "lfm"

LBA (Linear Ballistic Accumulator)

Uses the same parameters as RDM. See RDM above.

Set model = "lba"

Note: All required parameters must be defined at least once across prior_params, prior_formulas, between_trial_formulas, and item_formulas.

Parameter Hierarchy and Formula Evaluation:

The simulation uses a hierarchical parameter system with sequential formula evaluation, allowing later formulas to reference earlier ones:

prior_params - Initial constant values available to all formulas
prior_formulas - Evaluated once per condition, can reference prior_params. Use for condition-level parameters that vary across conditions.
between_trial_formulas - Evaluated once per trial within each condition. Can reference both prior_params and variables from prior_formulas. Use for trial-level variability.
item_formulas - Evaluated once per item within each trial. Can reference all previous parameters. Use for item-specific parameters.

Using Distributions:

You can use the distributional package to define random parameters. For example:

A ~ distributional::dist_uniform(0.5, 2.0) - Uniform distribution
V_condition ~ distributional::dist_normal(1.0, 0.2) - Normal distribution
sigma ~ 0.5 - Constant value
V ~ distributional::dist_normal(V_condition, sigma) - Reference earlier parameters

Each formula is evaluated sequentially, so you can build complex parameter dependencies. For instance, you might define a base drift rate V in prior_formulas, then add trial-level noise in between_trial_formulas, and finally scale by item position in item_formulas.

Value

An S3 object of class eam_simulation_config containing validated simulation parameters. This object should be passed to run_simulation to execute the simulation.

Examples

# Define formulas for the simulation
prior_formulas <- list(
  V ~ distributional::dist_uniform(0.1, 1.0),
  ndt ~ 0.3,
  noise_coef ~ 1
)

between_trial_formulas <- list()

item_formulas <- list(
  A_upper ~ 1,
  A_lower ~ -1,
  V ~ V
)

# Define noise factory
noise_factory <- function(context) {
  noise_coef <- context$noise_coef
  function(n, dt) {
    noise_coef * rnorm(n, mean = 0, sd = sqrt(dt))
  }
}

# Create configuration
config <- new_simulation_config(
  prior_formulas = prior_formulas,
  between_trial_formulas = between_trial_formulas,
  item_formulas = item_formulas,
  n_conditions = 10,
  n_trials_per_condition = 10,
  n_items = 5,
  max_reached = 5,
  max_t = 10,
  dt = 0.01,
  noise_mechanism = "add",
  noise_factory = noise_factory,
  model = "ddm",
  parallel = FALSE
)

# print the config
config

# Run simulation
sim_output <- run_simulation(config)
sim_output

Heuristic to calculate optimal chunk size for simulation configuration

Description

Heuristic to calculate optimal chunk size for simulation configuration

Usage

new_simulation_config.chunk_size.heuristic(
  n_conditions,
  n_trials_per_condition,
  n_items,
  parallel,
  n_cores
)

Arguments

n_conditions

Total number of conditions to simulate

n_trials_per_condition

Number of trials per condition

n_items

Number of items per trial

parallel

Whether to run in parallel

n_cores

Number of cores for parallel processing

Value

Optimal number of conditions per chunk

Create a eam_simulation_output object

Description

Create a eam_simulation_output object

Usage

new_simulation_output(simulation_config, output_dir)

Plot accuracy comparison between posterior and observed data

Description

Visualizes accuracy metrics comparing posterior simulation results with observed data. Creates side-by-side bar plots for easy comparison across conditions.

Usage

plot_accuracy(
  simulated_output,
  observed_df,
  x = "item_idx",
  facet_x = c(),
  facet_y = c()
)

Arguments

simulated_output

Posterior simulation output from run_simulation()

observed_df

Observed data frame

x

Variable for x-axis (default: "item_idx")

facet_x

Variables for faceting columns

facet_y

Variables for faceting rows

Value

A ggplot2 object

Examples

# Load posterior simulation output and observed data
base_dir <- system.file("extdata", "rdm_minimal", package = "eam")
post_output <- load_simulation_output(file.path(base_dir, "abc", "posterior", "neuralnet"))
obs_df <- read.csv(file.path(base_dir, "observation", "observation_data.csv"))

# Plot accuracy comparison between posterior and observed data
# The plot shows side-by-side bars comparing hit rates or accuracy
plot_accuracy(
  post_output,
  obs_df,
  facet_x = c("group")
)

Plot accuracy for DDM model (internal)

Description

Calculates hit rate (proportion of trials with choice == 1) across all possible trial combinations. For simulated data, expands grid based on simulation config parameters and left joins with actual simulation results. For observed data, assumes data is already in the correct format.

Usage

plot_accuracy_ddm(
  simulated_output,
  observed_df,
  x = "item_idx",
  facet_x = c(),
  facet_y = c()
)

Arguments

simulated_output

Simulation output object

observed_df

Observed data frame (already expanded with all trial combinations)

x

Variable for x-axis

facet_x

Variables for faceting columns

facet_y

Variables for faceting rows

Value

A ggplot2 object

Plot accuracy for DDM-2B model (internal)

Description

Plot accuracy for DDM-2B model (internal)

Usage

plot_accuracy_ddm_2b(
  simulated_output,
  observed_df,
  x = "item_idx",
  facet_x = c(),
  facet_y = c()
)

Arguments

simulated_output

Simulation output object

observed_df

Observed data frame

x

Variable for x-axis

facet_x

Variables for faceting columns

facet_y

Variables for faceting rows

Value

A ggplot2 object

Plot accuracy graph (internal)

Description

Plot accuracy graph (internal)

Usage

plot_accuracy_graph(
  accuracy_df,
  x = "item_idx",
  y = "accuracy",
  facet_x = c(),
  facet_y = c()
)

Arguments

accuracy_df

Data frame with accuracy values

x

Variable for x-axis

y

Variable for y-axis (default: "accuracy")

facet_x

Variables for faceting columns

facet_y

Variables for faceting rows

Value

A ggplot2 object

Plot CV parameter pair correlations

Description

Create a matrix of pairwise plots for cross-validation parameter estimates, including scatter plots with fitted trends, rank correlations, and marginal distributions.

Usage

plot_cv_pair_correlation(data, ...)

## S3 method for class 'cv4abc'
plot_cv_pair_correlation(data, ...)

Arguments

data

A cv4abc object containing true parameters and cross-validated estimates.

...

Additional arguments:

interactive: Logical; whether to pause between tolerance levels and wait for input

Value

Invisibly returns 'NULL'. Called for its side effect of producing plots.

Examples

# Load CV output from saved file
cv_file <- system.file(
  "extdata", "rdm_minimal", "abc", "cv", "neuralnet.rds",
  package = "eam"
)
abc_neuralnet_cv <- readRDS(cv_file)

# Plot parameter pair correlations
plot_cv_pair_correlation(abc_neuralnet_cv)

Plot CV parameter recovery

Description

Visualize parameter recovery from cross-validation results, showing estimated vs. true parameter values and residual distributions for each parameter.

Usage

plot_cv_recovery(data, ...)

## S3 method for class 'cv4abc'
plot_cv_recovery(data, ...)

Arguments

data

A cv4abc object containing true parameters and cross-validated estimates.

...

Additional arguments:

n_rows: Integer; number of rows in the plot grid (default: 3)
n_cols: Integer; number of columns in the plot grid, multiplied by 2 for paired plots (default: 1)
method: Character; smoothing method for geom_smooth (default: "lm")
formula: Formula; used in geom_smooth (default: y ~ x)
resid_tol: Numeric; quantile threshold for filtering residuals by absolute value. If specified, only observations with residuals below this quantile are plotted (default: NULL, no filtering)
interactive: Logical; whether to pause between pages and wait for user input (default: FALSE)

Value

Invisibly returns 'NULL'. Called for its side effect of producing plots.

Examples

# Load CV output from saved file
cv_file <- system.file(
  "extdata", "rdm_minimal", "abc", "cv", "neuralnet.rds",
  package = "eam"
)
abc_neuralnet_cv <- readRDS(cv_file)

# Plot parameter recovery
plot_cv_recovery(
  abc_neuralnet_cv,
  n_rows = 2,
  n_cols = 1,
  resid_tol = 0.99
)

Plot parameter posterior distributions

Description

Plotting posterior distributions (and optionally prior distributions) from ABC results.

Usage

plot_posterior_parameters(data, ...)

## S3 method for class 'abc'
plot_posterior_parameters(data, abc_input = NULL, ...)

Arguments

data

An abc object containing posterior samples in adj.values or unadj.values.

...

Additional arguments:

n_rows: Integer; number of rows in the plot grid (default: 2)
n_cols: Integer; number of columns in the plot grid (default: 2)
interactive: Logical; whether to pause between pages and wait for input

abc_input

Optional abc_input object containing prior samples for comparison.

Value

Invisibly returns 'NULL'. Called for its side effect of producing plots.

Examples

# Load ABC output from saved file
abc_file <- system.file(
  "extdata", "rdm_minimal", "abc", "abc_rejection_model.rds",
  package = "eam"
)
abc_rejection_model <- readRDS(abc_file)

# Load ABC input for prior comparison
abc_input_file <- system.file(
  "extdata", "rdm_minimal", "abc", "abc_input.rds",
  package = "eam"
)
abc_input <- readRDS(abc_input_file)

# Plot posterior distributions with prior comparison
plot_posterior_parameters(abc_rejection_model, abc_input)

Plot resample forest plots

Description

Create forest plots showing parameter ranges across resample iterations. Each iteration is displayed as a horizontal line with quantile intervals.

Usage

plot_resample_forest(
  data,
  n_rows = 2,
  n_cols = 2,
  interactive = FALSE,
  ci_level = 0.95
)

Arguments

data

List of abc results from abc_resample

n_rows

Number of rows in plot grid (default 2)

n_cols

Number of columns in plot grid (default 2)

interactive

Whether to pause between pages (default FALSE)

ci_level

quantile intervals (default 0.95 for 95% interval)

Value

No return value, called for side effects (plotting). Creates forest plots displayed in the graphics device.

Examples

# Load ABC input data from example simulation
abc_input <- readRDS(
  system.file("extdata", "rdm_minimal", "abc", "abc_input.rds", package = "eam")
)

# Perform ABC resampling
results <- abc_resample(
  target = abc_input$target,
  param = abc_input$param,
  sumstat = abc_input$sumstat,
  n_iterations = 100,
  n_samples = 100,
  tol = 0.5,
  method = "rejection"
)

# plot forest plots showing parameter ranges
plot_resample_forest(results, ci_level = 0.95)

Plot resample median distributions

Description

Plot density distributions of parameter medians across resample iterations.

Usage

plot_resample_medians(data, n_rows = 2, n_cols = 2, interactive = FALSE)

Arguments

data

List of abc results from abc_resample

n_rows

Number of rows in plot grid (default 2)

n_cols

Number of columns in plot grid (default 2)

interactive

Whether to pause between pages (default FALSE)

Value

No return value, called for side effects (plotting). Creates density plots displayed in the graphics device.

Examples

# Load ABC input data from example simulation
abc_input <- readRDS(
  system.file("extdata", "rdm_minimal", "abc", "abc_input.rds", package = "eam")
)

# Perform ABC resampling
results <- abc_resample(
  target = abc_input$target,
  param = abc_input$param,
  sumstat = abc_input$sumstat,
  n_iterations = 100,
  n_samples = 100,
  tol = 0.5,
  method = "rejection"
)

# plot the resample medians for each parameter
plot_resample_medians(results)

Plot reaction time distributions

Description

Visualize reaction time distributions from your model predictions. Overlay observed experimental data for reference.

Usage

plot_rt(simulated_output, observed_df, facet_x = c("item_idx"), facet_y = c())

Arguments

simulated_output

Output from run_simulation containing posterior predictions

observed_df

Your observed data as a data frame

facet_x

Variables to split plots horizontally. Default is "item_idx" to show separate plots for each item

facet_y

Variables to split plots vertically. Default is none (c())

Value

A plot showing predicted RT distributions (blue), with observed data (red) if provided

Examples

# Load example posterior simulation output
post_output_path <- system.file(
  "extdata", "rdm_minimal", "abc", "posterior", "neuralnet",
  package = "eam"
)
post_output <- load_simulation_output(post_output_path)

# Load example observed data
obs_file <- system.file(
  "extdata", "rdm_minimal", "observation", "observation_data.csv",
  package = "eam"
)
obs_df <- read.csv(obs_file)

# Plot RT distributions by item
plot_rt(post_output, obs_df, facet_x = c("item_idx"))

# Plot RT distributions by item and group
plot_rt(
  post_output,
  obs_df,
  facet_x = c("item_idx"),
  facet_y = c("group")
)

Pre-allocate data.table columns with appropriate data types

Description

This function creates pre-allocated vectors for all columns in the final data.table, determining data types from the first trial and condition.

Usage

preallocate_columns(sim_results, trial_col_names, cond_param_names, total_rows)

Arguments

sim_results

The output from run_simulation(), a list of conditions

trial_col_names

Character vector of trial column names

cond_param_names

Character vector of condition parameter names

total_rows

Integer, total number of rows to pre-allocate

Value

Named list of pre-allocated vectors for each column

Print method for eam simulation configuration

Description

Print method for eam simulation configuration

Usage

## S3 method for class 'eam_simulation_config'
print(x, ...)

Arguments

x

A eam_simulation_config object

...

Additional arguments (ignored)

Value

Invisibly returns the input object

Helper to resolved defined symbols in our formulas

Description

This function evaluates an expression in a given environment.

Usage

resolve_symbol(expr, env, n)

Arguments

expr

An expression to evaluate

env

An environment to evaluate the expression in

n

The number of values to generate if the expression is a distribution

Value

The evaluated value as it is, no assumption on its type

Route model alias to backend and enrich configuration

Description

This function uses a registry of backend detectors to determine which backend implementation should handle the given configuration. Each detector examines the config and returns a backend name if it can handle it, or NULL otherwise. This design pattern (Chain of Responsibility) makes it easy to add new backends without modifying this routing function.

Usage

route_model_to_backend(config)

Arguments

config

A list containing simulation configuration parameters

Value

The modified config list with added 'backend' parameter

Run a chunk of simulation conditions and save results to disk

Description

This function processes a chunk of simulation conditions, applies the flatten_simulation_results transformation, and saves the results to disk using Arrow's write_dataset with partitioning by chunk_idx.

Usage

run_chunk(config, output_dir, chunk_idx)

Arguments

config

A eam_simulation_config object containing all simulation parameters

output_dir

The base output directory

chunk_idx

The chunk index for partitioning (1-based)

Value

Invisible NULL (results are saved to disk)

Run a given condition with multiple trials

Description

This function runs multiple trials for a given condition using the specified

Usage

run_condition(
  condition_setting,
  between_trial_formulas,
  item_formulas,
  n_trials,
  n_items,
  max_reached,
  max_t,
  dt,
  noise_mechanism,
  noise_factory,
  backend,
  trajectories = FALSE
)

Arguments

condition_setting

A list of named values representing the condition settings

between_trial_formulas

A list of formulas defining the between-trial parameters

item_formulas

A list of formulas defining the item parameters

n_trials

The number of trials to simulate

n_items

The number of items per trial

max_reached

The threshold for evidence accumulation

max_t

The maximum time to simulate

dt

The step size for each increment

noise_mechanism

The noise mechanism to use ("add" or "mult")

noise_factory

A function that takes condition_setting and returns a noise function with signature function(n, dt)

backend

The backend implementation to use ("ddm", "ddm-2b", or "lca-gi")

trajectories

Whether to return full output including trajectories.

Value

A list containing the simulation results and condition parameters

Run a simulation with specified configuration

Description

This function runs a complete simulation based on the provided eam_simulation_config object, which is generated by the new_simulation_config function.

Usage

run_simulation(config, output_dir = NULL)

Arguments

config

A eam_simulation_config object containing all simulation parameters, you should use new_simulation_config to create one.

output_dir

The directory to save out-of-core results (optional, will use temp directory if not provided)

Details

This function uses an out-of-core approach to handle potentially large simulation results. Instead of returning a data frame directly, it persists the data to disk and returns an eam_simulation_output object that contains metadata and file system paths.

To access the simulation data, use the following methods on the returned object:

open_dataset() - Returns an Arrow Dataset containing the simulation results, e.g. sim_output$open_dataset()
open_evaluated_conditions() - Returns an Arrow Dataset containing the evaluated condition parameters, e.g. sim_output$open_evaluated_conditions()

Both methods return Arrow Dataset objects rather than data frames, allowing for efficient querying and filtering before loading data into memory. To convert to a data frame, use dplyr::collect() or as.data.frame().

Throughout this package, the eam_simulation_output object is used as the standard parameter for downstream analysis functions, rather than passing Arrow objects or data frames directly.

For multi-item backends, at each discrete time point, only one item can reach the threshold. The precision of this detection depends on the dt parameter. This design choice was made for performance considerations. For almost all experimental scenarios, it is negligible. But users should be aware of this limitation, if it is critical, try to increase the temporal resolution by reducing dt. For implementation details, refer to the backend source code (accumulate_evidence_* functions).

Value

A S3 object of class eam_simulation_output containing the output information

Examples

# Define formulas for the simulation
prior_formulas <- list(
  V ~ distributional::dist_uniform(0.1, 1.0),
  ndt ~ 0.3,
  noise_coef ~ 1
)

between_trial_formulas <- list()

item_formulas <- list(
  A_upper ~ 1,
  A_lower ~ -1,
  V ~ V
)

# Define noise factory
noise_factory <- function(context) {
  noise_coef <- context$noise_coef
  function(n, dt) {
    noise_coef * rnorm(n, mean = 0, sd = sqrt(dt))
  }
}

# Create configuration
config <- new_simulation_config(
  prior_formulas = prior_formulas,
  between_trial_formulas = between_trial_formulas,
  item_formulas = item_formulas,
  n_conditions = 10,
  n_trials_per_condition = 10,
  n_items = 5,
  max_reached = 5,
  max_t = 10,
  dt = 0.01,
  noise_mechanism = "add",
  noise_factory = noise_factory,
  model = "ddm",
  parallel = FALSE
)

# Run simulation
sim_output <- run_simulation(config)

# Access results
dataset <- sim_output$open_dataset()
dataset # an arrow dataset object

# if you want to load it into memory, you can use:
df <- as.data.frame(dataset)
head(df)

# Access evaluated condition parameters
cond_dataset <- sim_output$open_evaluated_conditions()
df_cond <- as.data.frame(cond_dataset)
head(df_cond)

Run a full simulation across multiple conditions in parallel

Description

This function runs a complete simulation across multiple conditions using parallel processing. It splits the conditions into chunks and processes each chunk on separate cores. Each condition has multiple trials and items. It uses the hierarchical structure: prior -> condition -> trial -> item. All parameters are taken from the configuration object.

Usage

run_simulation_parallel(config, output_dir)

Arguments

config

A eam_simulation_config object

output_dir

The base output directory

Value

No return value (results saved to disk)

Run a full simulation across multiple conditions (serial version)

Description

This function runs a complete simulation across multiple conditions serially, with each condition having multiple trials and items. It uses the hierarchical structure: prior -> condition -> trial -> item. All parameters are taken from the configuration object.

Usage

run_simulation_serial(config, output_dir)

Arguments

config

simulation config object

output_dir

The base output directory

Value

No return value (results saved to disk)

Run a single trial of the DDM simulation

Description

This function runs a single trial of the DDM simulation using the provided item formulas and trial settings. It's a wrapper around the core C++ function

Usage

run_trial_ddm(
  trial_setting,
  item_formulas,
  n_items,
  max_reached,
  max_t,
  dt,
  noise_mechanism,
  noise_factory,
  trajectories = FALSE
)

Arguments

trial_setting

A list of named values representing the trial settings

item_formulas

A list of formulas defining the item parameters

n_items

The number of items to simulate

max_reached

The threshold for evidence accumulation

max_t

The maximum time to simulate

dt

The step size for each increment

noise_mechanism

The noise mechanism to use ("add" or "mult")

noise_factory

A function that takes trial_setting and returns a noise function with signature function(n, dt)

trajectories

Whether to return full output including trajectories.

Value

A list containing the simulation results

Note

After evaluation, parameters A, V, and ndt are expected to be numeric vectors of length n_items. And they are matched by position. So, the first element of A, V, and ndt corresponds to the first item, and so on.

Run a single trial of the 2-boundary DDM simulation

Description

This function runs a single trial of the 2-boundary DDM simulation using the provided item formulas and trial settings. It's a wrapper around the core C++ function for 2-boundary DDM.

Usage

run_trial_ddm_2b(
  trial_setting,
  item_formulas,
  n_items,
  max_reached,
  max_t,
  dt,
  noise_mechanism,
  noise_factory,
  trajectories = FALSE
)

Arguments

trial_setting

A list of named values representing the trial settings

item_formulas

A list of formulas defining the item parameters

n_items

The number of items to simulate

max_reached

The threshold for evidence accumulation

max_t

The maximum time to simulate

dt

The step size for each increment

noise_mechanism

The noise mechanism to use ("add", "mult_evidence", or "mult_t")

noise_factory

A function that takes trial_setting and returns a noise function with signature function(n, dt)

trajectories

Whether to return full output including trajectories.

Value

A list containing the simulation results

Note

After evaluation, parameters A_upper, A_lower, V, and ndt are expected to be numeric vectors of length n_items. And they are matched by position. So, the first element of A_upper, A_lower, V, and ndt corresponds to the first item, and so on.

Run a single trial of the LCA-GI simulation

Description

This function runs a single trial of the LCA-GI (Leaky Competing Accumulator with Global Inhibition) simulation using the provided item formulas and trial settings. It's a wrapper around the core C++ function for LCA-GI.

Usage

run_trial_lca_gi(
  trial_setting,
  item_formulas,
  n_items,
  max_reached,
  max_t,
  dt,
  noise_factory,
  trajectories = FALSE
)

Arguments

trial_setting

A list of named values representing the trial settings

item_formulas

A list of formulas defining the item parameters

n_items

The number of items to simulate

max_reached

The threshold for evidence accumulation

max_t

The maximum time to simulate

dt

The step size for each increment

noise_factory

A function that takes trial_setting and returns a noise function with signature function(n, dt)

trajectories

Whether to return full output including trajectories.

Value

A list containing the simulation results

Note

After evaluation, parameters A, V, ndt, beta, and k are expected to be numeric vectors of length n_items. And they are matched by position. So, the first element of A, V, ndt, beta, and k corresponds to the first item, and so on.

Summarise data by groups with optional pivoting

Description

This function provides a flexible way to group data, compute summary statistics, and reshape results. It works similar to 'dplyr::summarise()' but with added capabilities for pivoting results wider.

Usage

summarise_by(
  .data = NULL,
  ...,
  .by = c("condition_idx"),
  .wider_by = c("condition_idx")
)

Arguments

.data

A data frame to summarise, or NULL to create a reusable summary function

...

Summary expressions using dplyr-style syntax. Named arguments become column names in the output (e.g., 'mean_rt = mean(rt)').

.by

Character vector of grouping column names. Default is "condition_idx".

.wider_by

Character vector of columns to keep as identifiers when pivoting. Default is "condition_idx". Must be a subset of '.by'. When '.wider_by' differs from '.by', the extra columns in '.by' will be spread across as column suffixes.

Details

You can use 'summarise_by()' in two ways: 1. **Direct use**: Pass your data directly and get results immediately 2. **Build-then-apply**: Create reusable summary functions, combine them with '+', then apply to your data later

The build-then-apply approach is useful when you want to compute different types of summaries (e.g., RT statistics and accuracy statistics) and automatically join them together.

Value

- If '.data' is provided: A data frame with summarised results - If '.data' is NULL: A function that can be applied to data later

Usage with ABC workflows

If you plan to use build_abc_input for ABC analysis, you must use summarise_by() to generate summary statistics (or manually handle the arrow output format). This function typically works together with map_by_condition to process simulation results. See map_by_condition for workflow examples.

Examples

# Example 1: Direct use - pass data and get results immediately
trial_data <- data.frame(
  condition_idx = rep(1:2, each = 4),
  item_idx = rep(1:2, 4),
  rt = c(0.5, 0.6, 0.7, 0.8, 0.55, 0.65, 0.75, 0.85),
  accuracy = c(1, 1, 0, 1, 1, 0, 1, 1)
)

# Compute mean RT and accuracy by condition and item
result <- summarise_by(
  trial_data,
  mean_rt = mean(rt),
  mean_acc = mean(accuracy),
  .by = c("condition_idx", "item_idx"),
  .wider_by = "condition_idx"
)
# Result has columns: condition_idx, mean_rt_item_idx_1, mean_rt_item_idx_2, etc.
result

# Example 2: Build-then-apply - create reusable summary functions
# Build separate summary functions for different statistics
rt_summary_pipe <- summarise_by(
  mean_rt = mean(rt),
  sd_rt = stats::sd(rt),
  .by = c("condition_idx", "item_idx"),
  .wider_by = "condition_idx"
)

acc_summary_pipe <- summarise_by(
  mean_acc = mean(accuracy),
  n_trials = length(accuracy),
  .by = c("condition_idx", "item_idx"),
  .wider_by = "condition_idx"
)

# Combine with + and apply to data
combined_summary_pipe <- rt_summary_pipe + acc_summary_pipe
result <- combined_summary_pipe(trial_data)
# Result has all summaries joined by condition_idx
result

Internal function to perform the core summarise_by logic

Description

Internal function to perform the core summarise_by logic

Usage

summarise_by_impl(.data, dots, .by, .wider_by)

Arguments

.data

A data frame to summarise

dots

Quosures containing the summary expressions

.by

Character vector of column names to group by

.wider_by

Character vector of column names to keep as identifying columns

Value

A data frame with class "eam_summarise_by_tbl"

Summarise posterior parameter distributions

Description

Compute summary statistics (mean, median, confidence intervals) for posterior parameters from ABC results.

Usage

summarise_posterior_parameters(data, ...)

## S3 method for class 'abc'
summarise_posterior_parameters(data, ..., ci_level = 0.95)

Arguments

data

An abc object containing posterior samples in adj.values or unadj.values.

...

Additional arguments for custom summary functions. Functions passed as named arguments will be applied to each parameter's posterior samples.

ci_level

Numeric; confidence interval level (default: 0.95).

Value

A data frame with summary statistics for each parameter.

Examples

# Load ABC output from saved file
abc_file <- system.file(
  "extdata", "rdm_minimal", "abc", "abc_rejection_model.rds",
  package = "eam"
)
abc_rejection_model <- readRDS(abc_file)

# Summarise posterior distributions
summarise_posterior_parameters(abc_rejection_model)

# Custom confidence interval level
summarise_posterior_parameters(abc_rejection_model, ci_level = 0.90)

Summarise resample medians

Description

Calculate summary statistics for parameter medians across resample iterations. Returns mean, median, and confidence intervals of the median distributions.

Usage

summarise_resample_medians(data, ..., ci_level = 0.95)

Arguments

data

List of abc results from abc_resample

...

Additional custom summary functions (named functions)

ci_level

Confidence level for intervals (default 0.95)

Value

Data frame with summary statistics for each parameter

Examples

# Load ABC input data from example simulation
abc_input <- readRDS(
  system.file("extdata", "rdm_minimal", "abc", "abc_input.rds", package = "eam")
)

# Perform ABC resampling
results <- abc_resample(
  target = abc_input$target,
  param = abc_input$param,
  sumstat = abc_input$sumstat,
  n_iterations = 100,
  n_samples = 100,
  tol = 0.5,
  method = "rejection"
)

# summarise the resample medians
summary_stats <- summarise_resample_medians(results, ci_level = 0.95)
print(summary_stats)