An introduction to ELAplus: Energy Landscape Analysis

Sotaro Takano, Kenta Suzuki, Hiroaki Fujita

1 Overview

ELAplus provides tools for Energy Landscape Analysis (ELA), a framework for analyzing dynamical systems represented as weighted networks. The package is designed primarily for the analysis of ecological community composition data, but the framework is applicable to other multistable systems.

The typical workflow consists of:

- Fitting community composition data to a (possibly extended) pairwise maximum entropy model

- Constructing an energy landscape from the fitted model

- Identifying stable states and transition structures

- Visualizing the resulting landscape and interactions

The theoretical background is described in Suzuki et al. (2021).

2 Example: Basic Energy Landscape Analysis

This vignette demonstrates a standard ELA workflow using test dataset.

2.1 Input dataset

# load package
library(ELAplus)
# load test data set
# species abundance table
data("baseabtable")
head(baseabtable)
#>          species.1 species.2 species.3 species.4 species.5 species.6 species.7
#> sample.1      2384         0      1313      8218         0      3529         0
#> sample.2     10343         0      2201      6814         0      2887         0
#> sample.3      4473         0       614      2046      4455      3429         0
#> sample.4      8786         0      5517       328         0        15         0
#> sample.5     10640         0      2321      6652         0      2642         0
#> sample.6         0         0       165      7794      1697      3816         0
#>          species.8 species.9 species.10 species.11 species.12 species.13
#> sample.1         0      3938        120       4156          0       3223
#> sample.2      6944      4907          0          0          0          0
#> sample.3      1503      8319          0          0          0      10433
#> sample.4      5372     10032       1722          0          0       5001
#> sample.5      7305      4858          0          0          0          0
#> sample.6         0      7230       2130       2267          0       6102
#>          species.14 species.15 species.16
#> sample.1       7150       3122          0
#> sample.2       7298          0          0
#> sample.3       5080          0          0
#> sample.4       6117          0          0
#> sample.5       7020          0          0
#> sample.6       9944          0          0

# metadata (environmental factors)
data("basemetadata")
head(basemetadata)
#>          factor.1  factor.2
#> sample.1    -0.92 2.7969211
#> sample.2     0.34 2.1470125
#> sample.3    -0.92 1.6061136
#> sample.4    -0.24 0.5847647
#> sample.5     0.40 1.7316138
#> sample.6    -0.82 2.9597175

2.2 (Optional) Data generation using a Gibbs sampling (heat-bath algorithm)

The simulated community composition data are also available.

In this example, the data comprising 24 species and 512 samples were generated through 100 iterations of Gibbs sampling (heat-bath algorithm).

Both implicit/explicit environmental factors (h.act and g.act) are considered. The number of explicit environmental factors is specified by ne (2 factors in this example).

j.act is the interaction matrix.

# load package
library(ELAplus)
# Generate random parameters
hb_params <- hb.paramgen(24, ne = 2) # the number of environmental factors
h.act <- hb_params[[1]]
j.act <- hb_params[[2]]
g.act <- hb_params[[3]]

# Simulate community composition data
rand_data <- HeatBath(100, 512, h.act, j.act, g = g.act)
rand_mat   <- rand_data[[1]]
rand_enmat <- rand_data[[2]]

2.3 Formatting the dataset

ELAplus expects the community data in a standardized representation.

Formatting() converts raw input tables into aligned occurrence (binary) matrix and (optionally) environmental factor matrix.

2.3.1 Inputs

  • baseabtable: samples - species abundance table (data.frame or matrix).

  • basemetadata (optional): a samples - factors metadata table (data.frame or matrix).

    • This is used when you want to include environmental parameters (e.g., temperature, pH, treatment group).
    • If you do not have environmental data, pass NULL (or an empty object).
formatted <- Formatting(baseabtable = baseabtable, basemetadata = basemetadata)
#> Processed 256 samples.
#> Relative abundance threshold = 0.05 
#> Occurrence threshold (lower) = 0.05 
#> Occurrence threshold (upper) = 0.95 
#> Selected  13  out of  16 species.

2.3.2 Key preprocessing options

2.3.2.1 Normalization (normalize)

  • If normalize = 1, the abundances within each sample are normalized so that each row sums to 1. This is useful when:
    • total sequencing depth differs strongly between samples etc…
  • If normalize = 0, the original scale is preserved.

2.3.2.2 Occurrence binarization (the “0/1 threshold”)

  • ELA operates on binary presence/absence states (each species is present or absent).
  • Formatting() creates the occurrence matrix by applying the first element of parameters:
  • parameters[1] (“0/1 threshold”): a numeric cutoff used to define presence.
    • Example: parameters[1] = 0.05 means “abundance > 0.05 is present; otherwise absent”.

2.3.2.3 Species filtering by prevalence (min/max occurrence thresholds)

  • Species that are too rare or almost always present typically add little information or inflate the results in construction of energy landscape.

  • Formatting() therefore filters species based on prevalence across samples using:

  • parameters[2] (“min occurrence threshold”)

  • parameters[3] (“max occurrence threshold”)

  • These are interpreted as proportions of samples in which the species is present after binarization.

  • Example:

    • parameters = c(0.01, 0.01, 0.99) means:
    • present(1) if abundance > 0.01,
    • keep species whose presence frequency is > 0.01 and < 0.99.
formatted <- Formatting(baseabtable = baseabtable, basemetadata = basemetadata,
                        normalize = 1, parameters = c(0.01, 0.01, 0.99))
#> Processed 256 samples.
#> Relative abundance threshold = 0.01
#> Occurrence threshold (lower) = 0.01 
#> Occurrence threshold (upper) = 0.99
#> Selected  16  out of  16 species.
ocvecs <- formatted[[1]]  # community composition binary matrix
abvecs <- formatted[[2]]  # community abundance matrix (not binarized)
envecs <- formatted[[3]]  # environmental variables (optional)

2.4 Parameter fitting

2.4.1 Extended maximum pairwise entropy model

The energy \(E(\sigma^{(k)},\varepsilon)\) of each community composition \(\sigma^{(k)}\) is assigned by imposing the potential structure to the state space by introducing the extended pairwise maximum entropy model. Briefly, the energy is determined by three parameters. For details, please see Appendix.

  • h: Implicit (unobserved) environmental parameter for each species

  • J: Species-species interaction matrix

  • g: Explicit environmental parameters for each species

2.4.2 Parameter fitting algorithm

Fitting the parameters of the pairwise maximum entropy model is performed using a stochastic approximation algorithm.

We approximate the intractable model expectation in the likelihood gradient with a persistent (warm-started) Gibbs/heat-bath chain advanced by one sweep per parameter iteration, corresponding to persistent contrastive divergence with one sweep (PCD-1), also known as stochastic maximum likelihood (SML).

It repeatedly simulates data under the current parameters and then updates h, J, and g to reduce the discrepancies between observed vs simulated metrics.

Two observed co-occurrence statistics are used for the fitting.

  1. Species–species co-occurrence: \(y = ocvecs^{T}ocvecs\)

  2. Environment–species co-occurrence: \(y^{env} = envecs^{T}ocvecs\)

Then, the algorithm repeatedly updates the parameters by comparing

  • observed vs simulated species–species co-occurrence \(y_{diff} = y_{observed} - y_{simulated}\)
  • observed vs simulated environment–species co-occurrence \(y^{env}_{diff} = y^{env}_{observed} - y^{env}_{simulated}\)

Each parameter independently updated by those metrics:

\[ \Delta{h} \propto diag(y_{diff}) \] \[ \Delta{J} \propto y_{diff} \]

\[ \Delta{g} \propto y^{env}_{diff} \]

2.5 Hyperparameter search for fitting

For simplicity, the usage of ELA without explicit environmental parameters (envecs) is first explained.

Fitting the pairwise maximum entropy model with runSA() depends on several optimization configurations, and the best settings can vary across datasets (number of species, sample size, sparsity).

ELAplus provides a dedicated grid-search routine to choose hyperparameters empirically.

Findbp() performs a grid search over three key hyperparameters:

-lmd : a sparsity penalty to encourage sparse matrix. Larger values typically lead to fewer nonzero parameters.

-we : AdamW weight decay, a form of L2 regularization applied during optimization.

-maxlrs : maximum learning rate in fitting algorithm (SML).

bpresult <- Findbp(ocvecs, rep = 2, threads = 2,
                   cvmode="10fold",nrepeat = 10, we=c(0.001),totalit=4000,
                   lmd=c(1e-05,1e-03,0.01),maxlrs = c(0.005),
                   tol=0.03, runadamW=TRUE,sparse=TRUE,fastfitting=TRUE)
#Start hyper-parameter search:
#There are 300 tasks
#Finished 300 tasks
#SA: elapsed time 15.08 sec

bp <- bpresult[[1]]  # typically best bpresult
bpm <- as.numeric(unlist(strsplit(names(bp)[1], split = "_")))
allresults <- bpresult[[2]]

2.5.1 What Findbp does

1. Splits samples into training and test subsets

  • The data are split using cross-validation based on cvmode.
  • For “10fold” or “5fold”, samples are randomly assigned into k folds (repeated nrepeat times), and each fold is used as the test set while the remaining folds are used for training.
  • For “loo” (leave-one-out), each sample is used once as the test set while the rest form the training set.

2. Fits the model on the training subset for each grid point

  • For each combination of candidate hyperparameters, the function calls runSA().

3. Evaluates predictive performance on held-out samples

  • Each fitted model is scored by its predictive error on the test subset (lower is better).
  • The best-performing settings are returned.
  • If fastfitting=TRUE, the function returns candidates with the lowest iterations within tol of the best observed error.

2.5.2 Fitting results can be visualized by plotSAtest.

plotSAtest(allresults)
# "id" corresponds to hyperparameter: lmd_we_maxlr

2.5.3 Fitting the extended maximum pairwise entropy model

The best parameters found by Findbp() are used for real analyses by runSA().

sa <- runSA(
  ocvecs,
  enmat   = NULL,
  rep     = 32,
  getall  = FALSE,
  lambda  = bpm[[1]],
  we      = bpm[[2]],
  maxlr = bpm[[3]],
  totalit = bpm[[4]]
)
#> Start parameter fitting:
#> .SA: elapsed time 0.13 sec

2.6 Constructing the energy landscape

After fitting the (extended) pairwise maximum entropy model with runSA(), the next step is to construct energy landscape over community compositions using the fitted parameters.

In ELAplus, this is performed by ELA(), which identifies:

1. Stable states Here, the base of the basin (with minimal energy) is regarded as a stable state, which is distinct from the equilibrium point of the dynamical system or the attractor. For details, see Suzuki et al. (2021),

2. Tipping points (energy barriers connecting pairs of basins/stable states). A connecting pair of basins have basin boundary and the point with the lowest energy barrier is regarded as tipping point here.

The fitted model defines an energy value for any community composition (typically a binary presence/absence vector of community members).

ela <- ELA(
  sa,
  env            = NULL,
  threads        = 1,
  SS.itr         = 20000,
  FindingTip.itr = 10000,
  reporting      = TRUE
)
#> Start ELA:
#> 7 stable states were found.
#> Checking 21 tipping points.
#> converting...
#> ELA: elapsed time 2.19 sec

2.7 ELA output

After stable states and tipping points are computed, ELA() summarizes and standardizes the outputs. This conversion step produces a compact representation of the energy landscape structure:

The output format is compatible to downstream steps such as pruning (ELPruning) and visualization (e.g., disconnectivity graphs).

ela[[1]] # stable state IDs
#> [1] "09x" "01t" "EWB" "EY8" "1uV" "1mL" "17s"
ela[[2]] # stable state energies
#> [1] -11.099418 -10.297931  -9.544050  -8.961634  -8.005040  -6.440385  -6.269140

2.7.1 What ELA does

  1. Identifying stable states by repeated steepest descent (SSestimate)
  • The number of repetitions is controlled by SS.itr (default 20000).
  • More iterations increase the chance of discovering all major stable states, especially when the number of species is large, landscape has many stable states, etc…
  1. Finding tipping points between stable states
  • Once a set of stable states is obtained, ELA() evaluates transitions between them by searching for tipping points between all pairs of stable states.
  • The number of iterations is controlled by FindingTip.itr. As with SS.itr, increasing FindingTip.itr improves robustness at the cost of runtime. This step can be computationally heavy because it is performed across stable-state pairs.

2.8 Pruning stable states

The energy landscape constructed by ELA() first identified stable states as much as possible. Some of these stable states correspond to very shallow basins.

Such stable states are often sensitive to noise and does not largely explain dominant multistability of the system.

ELPruning() addresses this issue by iteratively pruning shallow basins and merging them into deeper, neighboring basins.

elap <- ELPruning(ela, th=0.1, type="relative")
#> Start pruning:
#> *...
#> ELPruning: elapsed time 0.05 sec

2.8.1 Output and interpretation

ELPruning() returns a list with two elements:

[[1]] Pruned ELA object (same format as the output of ELA()):

  • [[1]] updated stable state IDs,
  • [[2]] updated stable state energies,
  • [[3]] updated tipping point IDs,
  • [[4]] updated tipping energies.
elap[[1]][[1]] # updated stable state IDs
#> [1] "09x" "EWB" "EY8" "1uV"
elap[[1]][[2]] # updated stable state energies
#> [1] -11.099418  -9.544050  -8.961634  -8.005040

[[2]] Stable-state mapping table

  • ss.before.pruning: stable state IDs from the original (unpruned) landscape
  • ss.after.pruning: corresponding stable state IDs after pruning
elap[[2]]
#>  ss.before.pruning ss.after.pruning
#> 1               09x              09x
#> 2               01t              09x
#> 3               EWB              EWB
#> 4               EY8              EY8
#> 5               1uV              1uV
#> 6               1mL              09x
#> 7               17s              09x

2.8.2 Basin depth and pruning criterion

For each stable state, a basin depth is defined as the energy difference between the energy of the stable state itself and the lowest tipping point energy connecting that stable state to any other stable state (\(\Delta E\)).

The basin depth measures how “robust” a stable state is against transitions: deeper basins correspond to more robust, while shallow basins correspond to more unstable states.

In ELPruning(), there are two methods relative or bootstrap.

2.8.2.1 Relative depth pruning

If type="relative" is set, th (relative depth threshold) specifies the minimum acceptable energy barrier relative to the deepest basin barrier. \[ \Delta E_{\mathrm{min}} < th * \Delta E_{\mathrm{max}} \] For example, th = 0.2 means that basins shallower than 20% of the deepest basin (\(\Delta E_{\mathrm{max}}\)) are pruned. The results of pruning is usually sensitive to this value.

elap <- ELPruning(ela, th=0.2, type="relative")
#> Start pruning:
#> *...
#> ELPruning: elapsed time 0.05 sec
elap[[1]][[1]] # updated stable state IDs
#> [1] "09x" "EWB" "1uV"

2.8.2.2 bootstrap samples pruning

Alternatively, users can perform pruning of stable states by calculating vulnerabilities in bootstrap samples in each stable state.

This method is explained in the next section:Pruning with bootstrap resampling.

2.9 Visualization

2.9.1 Disconnectivity graph

A disconnectivity graph is a compact visualization of the global structure of an energy landscape.

  • It summarizes how stable states are separated by energy barriers (tipping points).
  • The vertical axis represents energy
  • Branches illustrate how stable states are connected and separated by tipping points.
# Before pruning
showDG(ela, ocvecs, "example")

# After pruning
showDG(elap[[1]], ocvecs, "example")

2.9.1.1 How to interpret the graph

  • Lower energies correspond to more stable configurations.
  • Branching structure indicates which basins are “close” in terms of energy barriers.
  • Barrier height as a transition difficulty proxy.

2.9.1.2 Optional checking bootstrap resampling

The reliability of stable states can be also examined by the bootstrap resampling of training data sets (the occurrence matrix). The function, bootstrap_SA(), performs parameter fitting with bootstrap resampling of the data sets.

bootstrap_ELA() uses the results of bootstrap_SA() (i.e., estimated parameters: h, J) for computing the variability of energy gaps between tipping points and stable states arising from uncertainty due to finite sample size.

If the estimated stable state is statistically reliable, the estimated \(\Delta E\) is expected to be consistently smaller than 0 across bootstrap resamples of the training data.

Therefore, the probability of bootstrap samples with \(\Delta E < 0\) reflects the statistical significance of the estimated stable state. Here, this quantity is referred to as a “bootstrap-approximated p-value” and is used to assess statistical reliability.

As an example, the distribution of \(\Delta E\) between “01t” and “09/” is shown as a histogram. More than 25% of bootstrap samples yield \(\Delta E > 0\) (i.e., p-val > 0.25), and this case is regarded as statistically not reliable.

If bootstrap_ene is specified in showDG(), the function displays an approximate p-value in each stable state. This option is helpful when you want to see uncertainty in estimated energies in ELA.

bs_params <- bootstrap_SA(ocvecs, rep = 16, threads=8, bootstrap.it = 1000, 
                          lambda  = bpm[[1]],we = bpm[[2]],
                          maxlr = bpm[[3]], totalit = bpm[[4]])

bootstrap <- bootstrap_ELA(bs_params,ocvecs,ela=ela,threads = 1)
ela_rep <- bootstrap[[1]]
bootstrap_ene <- bootstrap[[2]]
showDG(ela_rep,ocvecs,bootstrap_ene = bootstrap_ene, min_pval = 0.01, 
       scale_labels = c(0.05,0.1,0.2,0.5))

2.9.1.3 Pruning with bootstrap resampling

The pruning stable states is also possible using the bootstrap results.

The ELPruning() with type="bootstrap"used the distribution of \(\Delta E\) in bootstrap samples and evaluated whether the predicted stable states are statistically reliable.

If the p-value is above threshold (e.g., 0.05), the function prunes such unreliable and shallow basins (stable states) and merges them into deeper, neighboring basins. The pruned graph is follows.

elap_rep <- ELPruning(ela, type = "bootstrap",bootstrap_ene=bootstrap_ene,lower_qr = 0.25)
showDG(ela_rep,ocvecs,bootstrap_ene = bootstrap_ene, min_pval = 0.01, 
       scale_labels = c(0.05,0.1,0.2,0.5))

If getGraohObj=TRUE, showDG() function returns a data frame for plotting a disconnectivity graph. In the following example, disconnectivity graph is drawn using ggplot(). Tipping points and stable states are shown as triangles and circles, respectively, and colored by energies.

grobj_to_plot <- showDG(ela_rep,ocvecs,bootstrap_ene = bootstrap_ene,getGraphObj = TRUE)
grobj_point <- grobj_to_plot[grobj_to_plot$point_str != "",]
g <- ggplot(
  grobj_to_plot,
  aes(x=.data$nodes2xposi, y=.data$energy, label=.data$point_str)) + 
  xlab("") + geom_text(hjust=0.75, vjust=2, aes(fontface=2)) + geom_path() +
  geom_point(
    data = grobj_point,
    mapping = aes(
      x = .data$nodes2xposi,
      y = .data$energy,
      color = .data$energy,
      shape = .data$type,
    ),size = 4
  ) +
  scale_color_viridis_c(option = "plasma",alpha=0.8,
                        direction=-1,na.value = "grey") +
  coord_cartesian(xlim=c(0.75,7.5), ylim=c(-5.8,-11.5))
plot(g)

2.9.2 Species interaction graph

showIntrGraph() visualizes the pairwise interaction structure inferred by the fitted model in a 2D “interaction space”.

It extracts estimated parameters from the resulting interaction matrix (jest).

  • Each species is plotted as a point at its PCoA coordinates.
  • Links with value > th are classified as positive, while links with value < -th are classified as negative.

The plot draws these two link types separately and colors them differently (positive = red, negative = blue).

showIntrGraph(
  elap[[1]],
  sa,
  th = 0.01,
  annot_adj = c(0.75, 1.00)
)

3 Environmental gradient analysis

The stability structure of a community is not fixed, but changes continuously along an environmental gradient (e.g., temperature, nutrient availability, disturbance intensity).

GradELA() is designed to capture this dependence by estimating energy landscapes repeatedly across a controlled range of an environmental factor and summarizing how stable states and transition barriers change.

When environmental variables (envecs) are included, ELAplus supports Gradient Energy Landscape Analysis via GradELA(). The parameter fitting process is same as that for normal ELA except for specifying enmat.

bpresult <- Findbp(ocvecs, enmat = envecs, rep = 2, threads = 2,
                   cvmode="10fold",nrepeat = 10, we=c(0.001),totalit=4000,
                   lmd=c(1e-05,1e-03,0.01),maxlrs = c(0.005),
                   tol=0.03, runadamW=TRUE,sparse=TRUE,fastfitting=TRUE)
bp <- bpresult[[1]]
bpm <- as.numeric(unlist(strsplit(names(bp)[1], split = "_")))

sa <- runSA(
  ocvecs,
  enmat   = envecs,
  rep     = 16,
  getall  = FALSE,
  lambda = bpm[[1]],
  we = bpm[[2]],
  maxlr = bpm[[3]],
  totalit = bpm[[4]]
)

3.1 Concept of GradELA

GradELA() proceeds in three conceptual stages:

  1. Specification of an environmental gradient
  1. Energy landscape construction at each step
# This process takes few minutes with a small number of threads, and here pre-computed results are shown.
gela <- GradELA(
  sa      = sa,
  eid     = "factor.1",
  enmat   = envecs,
  env     = NULL,
  range   = c(-1, 1),
  steps   = 32,
  th = 0.05,
  threads = 8, 
  pruning_type = "relative", 
  bs_params = bs_params,
  ocmat = ocvecs)

Users can use bootstrap samples in ELPruning process by pruning_type="bootstrap". Other options are equivalent to ELPruning().

# This process takes few minutes with a small number of threads, and here pre-computed results are shown.
bs_params <- bootstrap_SA(ocvecs, enmat = envecs,rep = 16, threads=8,bootstrap.it = 1000, 
                          lambda  = bpm[[1]],we = bpm[[2]],
                          maxlr = bpm[[3]], totalit = bpm[[4]])

gela <- GradELA(
  sa      = sa,
  eid     = "factor.1",
  enmat   = envecs,
  env     = NULL,
  range   = c(-1, 1),
  steps   = 32,
  lower_qr = 0.25,
  threads = 8, 
  pruning_type = "bootstrap", 
  bs_params = bs_params,
  ocmat = ocvecs)

3.2 Output

GradELA() returns a list with three elements:

1. Pruned ELA results for each step

2. Environmental value sequence

3. Reference environmental vector

gela[[1]][[1]][[1]][[1]] # stable state IDs at 1st environmental parameter
# [1] "0Ht" "EWB" "1uV"
gela[[1]][[20]][[1]][[1]] # stable state IDs at 20th environmental parameter
# [1] "09x" "EWB"

3.3 Visualization of gradient ELA results

The resulting object can then be plotted using showSSD(), which visualize the regime shift: how the number, identity, and energy of stable states change along the environmental gradient.

# GradELA result (ELPruning with type="relative").
showSSD(gela)

# GradELA result (ELPruning with type="bootstrap").
showSSD(gela)

If users want to modify the figure configuration of the diagram, showSSD_ggplot() function returns a data frame for plotting. Then users can the configuration manually using ggplot().

ssd_df <- showSSD_ggplot(gela,plot = FALSE,getSSDobj = TRUE)
g <- ggplot(ssd_df, aes(x = env, y = Energy, color = SS_ID)) +
            geom_point(shape = 19) +
            geom_line(aes(group = SS_ID))

3.4 3D plot of gradient ELA results

showGELA3D() visualizes a smoothed energy landscape surface produced by GELAObj() and overlays stable-state trajectories.

# This process takes few minutes, and here pre-computed result is drawn. 
gelaobj <- GELAObj(gela, sa=sa,threads=2)
showGELA3D(gelaobj,theta = 30,phi = 30)

4 Appendix

Overview of extended maximum pairwise entropy model

The model determines the probability of the occurrence of community composition \(\sigma^{(k)}\) in an environmental condition \(\varepsilon = (\varepsilon_{1}, \varepsilon_{2}, ..., \varepsilon_{M})\), environmental factors such as pH or temperature.

\[ E(\sigma^{(k)},\varepsilon) = - \sum_{i=1}^{S} h_i \sigma_i^{(k)} - \sum_{j=1}^{S} \sum_{i=1}^{M} g_{ij} \varepsilon_i^{(k)} \sigma_j^{(k)} - \sum_{i=1}^{S} \sum_{j=1,i \neq j}^{S} J_{ij} \sigma_i^{(k)} \sigma_j^{(k)} \] where: