ELAplus provides tools for Energy Landscape Analysis (ELA), a framework for analyzing dynamical systems represented as weighted networks. The package is designed primarily for the analysis of ecological community composition data, but the framework is applicable to other multistable systems.
The typical workflow consists of:
- Fitting community composition data to a (possibly extended) pairwise maximum entropy model
- Constructing an energy landscape from the fitted model
- Identifying stable states and transition structures
- Visualizing the resulting landscape and interactions
The theoretical background is described in Suzuki et al. (2021).
This vignette demonstrates a standard ELA workflow using test dataset.
baseabtable: samples - species abundance table (data.frame or matrix).
basemetadata (optional): a samples - factors metadata table (data.frame or matrix).
baseabtable)# load package
library(ELAplus)
# load test data set
# species abundance table
data("baseabtable")
head(baseabtable)
#> species.1 species.2 species.3 species.4 species.5 species.6 species.7
#> sample.1 2384 0 1313 8218 0 3529 0
#> sample.2 10343 0 2201 6814 0 2887 0
#> sample.3 4473 0 614 2046 4455 3429 0
#> sample.4 8786 0 5517 328 0 15 0
#> sample.5 10640 0 2321 6652 0 2642 0
#> sample.6 0 0 165 7794 1697 3816 0
#> species.8 species.9 species.10 species.11 species.12 species.13
#> sample.1 0 3938 120 4156 0 3223
#> sample.2 6944 4907 0 0 0 0
#> sample.3 1503 8319 0 0 0 10433
#> sample.4 5372 10032 1722 0 0 5001
#> sample.5 7305 4858 0 0 0 0
#> sample.6 0 7230 2130 2267 0 6102
#> species.14 species.15 species.16
#> sample.1 7150 3122 0
#> sample.2 7298 0 0
#> sample.3 5080 0 0
#> sample.4 6117 0 0
#> sample.5 7020 0 0
#> sample.6 9944 0 0
# metadata (environmental factors)
data("basemetadata")
head(basemetadata)
#> factor.1 factor.2
#> sample.1 -0.92 2.7969211
#> sample.2 0.34 2.1470125
#> sample.3 -0.92 1.6061136
#> sample.4 -0.24 0.5847647
#> sample.5 0.40 1.7316138
#> sample.6 -0.82 2.9597175The simulated community composition data are also available.
In this example, the data comprising 24 species and 512 samples were generated through 100 iterations of Gibbs sampling (heat-bath algorithm).
Both implicit/explicit environmental factors (h.act and g.act) are considered. The number of explicit environmental factors is specified by ne (2 factors in this example).
j.act is the interaction matrix.
# load package
library(ELAplus)
# Generate random parameters
hb_params <- hb.paramgen(24, ne = 2) # the number of environmental factors
h.act <- hb_params[[1]]
j.act <- hb_params[[2]]
g.act <- hb_params[[3]]
# Simulate community composition data
rand_data <- HeatBath(100, 512, h.act, j.act, g = g.act)
rand_mat <- rand_data[[1]]
rand_enmat <- rand_data[[2]]ELAplus expects the community data in a standardized representation.
Formatting() converts raw input tables into aligned occurrence (binary) matrix and (optionally) environmental factor matrix.
baseabtable: samples - species abundance table (data.frame or matrix).
basemetadata (optional): a samples - factors metadata table (data.frame or matrix).
formatted <- Formatting(baseabtable = baseabtable, basemetadata = basemetadata)
#> Processed 256 samples.
#> Relative abundance threshold = 0.05
#> Occurrence threshold (lower) = 0.05
#> Occurrence threshold (upper) = 0.95
#> Selected 13 out of 16 species.normalize = 1, the abundances within each sample are normalized so that each row sums to 1. This is useful when:
normalize = 0, the original scale is preserved.Formatting() creates the occurrence matrix by applying the first element of parameters:parameters[1] (“0/1 threshold”): a numeric cutoff used to define presence.
parameters[1] = 0.05 means “abundance > 0.05 is present; otherwise absent”.Species that are too rare or almost always present typically add little information or inflate the results in construction of energy landscape.
Formatting() therefore filters species based on prevalence across samples using:
parameters[2] (“min occurrence threshold”)
parameters[3] (“max occurrence threshold”)
These are interpreted as proportions of samples in which the species is present after binarization.
Example:
parameters = c(0.01, 0.01, 0.99) means:formatted <- Formatting(baseabtable = baseabtable, basemetadata = basemetadata,
normalize = 1, parameters = c(0.01, 0.01, 0.99))
#> Processed 256 samples.
#> Relative abundance threshold = 0.01
#> Occurrence threshold (lower) = 0.01
#> Occurrence threshold (upper) = 0.99
#> Selected 16 out of 16 species.
ocvecs <- formatted[[1]] # community composition binary matrix
abvecs <- formatted[[2]] # community abundance matrix (not binarized)
envecs <- formatted[[3]] # environmental variables (optional)The energy \(E(\sigma^{(k)},\varepsilon)\) of each community composition \(\sigma^{(k)}\) is assigned by imposing the potential structure to the state space by introducing the extended pairwise maximum entropy model. Briefly, the energy is determined by three parameters. For details, please see Appendix.
h: Implicit (unobserved) environmental parameter for each species
J: Species-species interaction matrix
g: Explicit environmental parameters for each species
Fitting the parameters of the pairwise maximum entropy model is performed using a stochastic approximation algorithm.
We approximate the intractable model expectation in the likelihood gradient with a persistent (warm-started) Gibbs/heat-bath chain advanced by one sweep per parameter iteration, corresponding to persistent contrastive divergence with one sweep (PCD-1), also known as stochastic maximum likelihood (SML).
It repeatedly simulates data under the current parameters and then updates h, J, and g to reduce the discrepancies between observed vs simulated metrics.
Two observed co-occurrence statistics are used for the fitting.
Species–species co-occurrence: \(y = ocvecs^{T}ocvecs\)
Environment–species co-occurrence: \(y^{env} = envecs^{T}ocvecs\)
Then, the algorithm repeatedly updates the parameters by comparing
Each parameter independently updated by those metrics:
\[ \Delta{h} \propto diag(y_{diff}) \] \[ \Delta{J} \propto y_{diff} \]
\[ \Delta{g} \propto y^{env}_{diff} \]
For simplicity, the usage of ELA without explicit environmental parameters (envecs) is first explained.
Fitting the pairwise maximum entropy model with runSA() depends on several optimization configurations, and the best settings can vary across datasets (number of species, sample size, sparsity).
ELAplus provides a dedicated grid-search routine to choose hyperparameters empirically.
Findbp() performs a grid search over three key hyperparameters:
-lmd : a sparsity penalty to encourage sparse matrix. Larger values typically lead to fewer nonzero parameters.
-we : AdamW weight decay, a form of L2 regularization applied during optimization.
-maxlrs : maximum learning rate in fitting algorithm (SML).
bpresult <- Findbp(ocvecs, rep = 2, threads = 2,
cvmode="10fold",nrepeat = 10, we=c(0.001),totalit=4000,
lmd=c(1e-05,1e-03,0.01),maxlrs = c(0.005),
tol=0.03, runadamW=TRUE,sparse=TRUE,fastfitting=TRUE)
#Start hyper-parameter search:
#There are 300 tasks
#Finished 300 tasks
#SA: elapsed time 15.08 sec
bp <- bpresult[[1]] # typically best bpresult
bpm <- as.numeric(unlist(strsplit(names(bp)[1], split = "_")))
allresults <- bpresult[[2]]1. Splits samples into training and test subsets
nrepeat times), and each fold is used as the test set while the remaining folds are used for training.2. Fits the model on the training subset for each grid point
runSA().3. Evaluates predictive performance on held-out samples
fastfitting=TRUE, the function returns candidates with the lowest iterations within tol of the best observed error.plotSAtest.plotSAtest(allresults)
# "id" corresponds to hyperparameter: lmd_we_maxlrThe best parameters found by Findbp() are used for real analyses by runSA().
sa <- runSA(
ocvecs,
enmat = NULL,
rep = 32,
getall = FALSE,
lambda = bpm[[1]],
we = bpm[[2]],
maxlr = bpm[[3]],
totalit = bpm[[4]]
)
#> Start parameter fitting:
#> .SA: elapsed time 0.13 secAfter fitting the (extended) pairwise maximum entropy model with runSA(), the next step is to construct energy landscape over community compositions using the fitted parameters.
In ELAplus, this is performed by ELA(), which identifies:
1. Stable states Here, the base of the basin (with minimal energy) is regarded as a stable state, which is distinct from the equilibrium point of the dynamical system or the attractor. For details, see Suzuki et al. (2021),
2. Tipping points (energy barriers connecting pairs of basins/stable states). A connecting pair of basins have basin boundary and the point with the lowest energy barrier is regarded as tipping point here.
The fitted model defines an energy value for any community composition (typically a binary presence/absence vector of community members).
ela <- ELA(
sa,
env = NULL,
threads = 1,
SS.itr = 20000,
FindingTip.itr = 10000,
reporting = TRUE
)
#> Start ELA:
#> 7 stable states were found.
#> Checking 21 tipping points.
#> converting...
#> ELA: elapsed time 2.19 secAfter stable states and tipping points are computed, ELA() summarizes and standardizes the outputs. This conversion step produces a compact representation of the energy landscape structure:
bin2id())The output format is compatible to downstream steps such as pruning (ELPruning) and visualization (e.g., disconnectivity graphs).
ela[[1]] # stable state IDs
#> [1] "09x" "01t" "EWB" "EY8" "1uV" "1mL" "17s"
ela[[2]] # stable state energies
#> [1] -11.099418 -10.297931 -9.544050 -8.961634 -8.005040 -6.440385 -6.269140SSestimate)SS.itr (default 20000).ELA() evaluates transitions between them by searching for tipping points between all pairs of stable states.FindingTip.itr. As with SS.itr, increasing FindingTip.itr improves robustness at the cost of runtime. This step can be computationally heavy because it is performed across stable-state pairs.The energy landscape constructed by ELA() first identified stable states as much as possible. Some of these stable states correspond to very shallow basins.
Such stable states are often sensitive to noise and does not largely explain dominant multistability of the system.
ELPruning() addresses this issue by iteratively pruning shallow basins and merging them into deeper, neighboring basins.
elap <- ELPruning(ela, th=0.1, type="relative")
#> Start pruning:
#> *...
#> ELPruning: elapsed time 0.05 secELPruning() returns a list with two elements:
[[1]] Pruned ELA object (same format as the output of ELA()):
elap[[1]][[1]] # updated stable state IDs
#> [1] "09x" "EWB" "EY8" "1uV"
elap[[1]][[2]] # updated stable state energies
#> [1] -11.099418 -9.544050 -8.961634 -8.005040[[2]] Stable-state mapping table
elap[[2]]
#> ss.before.pruning ss.after.pruning
#> 1 09x 09x
#> 2 01t 09x
#> 3 EWB EWB
#> 4 EY8 EY8
#> 5 1uV 1uV
#> 6 1mL 09x
#> 7 17s 09xFor each stable state, a basin depth is defined as the energy difference between the energy of the stable state itself and the lowest tipping point energy connecting that stable state to any other stable state (\(\Delta E\)).
The basin depth measures how “robust” a stable state is against transitions: deeper basins correspond to more robust, while shallow basins correspond to more unstable states.
In ELPruning(), there are two methods relative or bootstrap.
If type="relative" is set, th (relative depth threshold) specifies the minimum acceptable energy barrier relative to the deepest basin barrier. \[
\Delta E_{\mathrm{min}} < th * \Delta E_{\mathrm{max}}
\] For example, th = 0.2 means that basins shallower than 20% of the deepest basin (\(\Delta E_{\mathrm{max}}\)) are pruned. The results of pruning is usually sensitive to this value.
elap <- ELPruning(ela, th=0.2, type="relative")
#> Start pruning:
#> *...
#> ELPruning: elapsed time 0.05 sec
elap[[1]][[1]] # updated stable state IDs
#> [1] "09x" "EWB" "1uV"Alternatively, users can perform pruning of stable states by calculating vulnerabilities in bootstrap samples in each stable state.
This method is explained in the next section:Pruning with bootstrap resampling.
A disconnectivity graph is a compact visualization of the global structure of an energy landscape.
# Before pruning
showDG(ela, ocvecs, "example")# After pruning
showDG(elap[[1]], ocvecs, "example")The reliability of stable states can be also examined by the bootstrap resampling of training data sets (the occurrence matrix). The function, bootstrap_SA(), performs parameter fitting with bootstrap resampling of the data sets.
bootstrap_ELA() uses the results of bootstrap_SA() (i.e., estimated parameters: h, J) for computing the variability of energy gaps between tipping points and stable states arising from uncertainty due to finite sample size.
If the estimated stable state is statistically reliable, the estimated \(\Delta E\) is expected to be consistently smaller than 0 across bootstrap resamples of the training data.
Therefore, the probability of bootstrap samples with \(\Delta E < 0\) reflects the statistical significance of the estimated stable state. Here, this quantity is referred to as a “bootstrap-approximated p-value” and is used to assess statistical reliability.
As an example, the distribution of \(\Delta E\) between “01t” and “09/” is shown as a histogram. More than 25% of bootstrap samples yield \(\Delta E > 0\) (i.e., p-val > 0.25), and this case is regarded as statistically not reliable.
If bootstrap_ene is specified in showDG(), the function displays an approximate p-value in each stable state. This option is helpful when you want to see uncertainty in estimated energies in ELA.
bs_params <- bootstrap_SA(ocvecs, rep = 16, threads=8, bootstrap.it = 1000,
lambda = bpm[[1]],we = bpm[[2]],
maxlr = bpm[[3]], totalit = bpm[[4]])
bootstrap <- bootstrap_ELA(bs_params,ocvecs,ela=ela,threads = 1)
ela_rep <- bootstrap[[1]]
bootstrap_ene <- bootstrap[[2]]
showDG(ela_rep,ocvecs,bootstrap_ene = bootstrap_ene, min_pval = 0.01,
scale_labels = c(0.05,0.1,0.2,0.5))The pruning stable states is also possible using the bootstrap results.
The ELPruning() with type="bootstrap"used the distribution of \(\Delta E\) in bootstrap samples and evaluated whether the predicted stable states are statistically reliable.
If the p-value is above threshold (e.g., 0.05), the function prunes such unreliable and shallow basins (stable states) and merges them into deeper, neighboring basins. The pruned graph is follows.
elap_rep <- ELPruning(ela, type = "bootstrap",bootstrap_ene=bootstrap_ene,lower_qr = 0.25)
showDG(ela_rep,ocvecs,bootstrap_ene = bootstrap_ene, min_pval = 0.01,
scale_labels = c(0.05,0.1,0.2,0.5))If getGraohObj=TRUE, showDG() function returns a data frame for plotting a disconnectivity graph. In the following example, disconnectivity graph is drawn using ggplot(). Tipping points and stable states are shown as triangles and circles, respectively, and colored by energies.
grobj_to_plot <- showDG(ela_rep,ocvecs,bootstrap_ene = bootstrap_ene,getGraphObj = TRUE)
grobj_point <- grobj_to_plot[grobj_to_plot$point_str != "",]
g <- ggplot(
grobj_to_plot,
aes(x=.data$nodes2xposi, y=.data$energy, label=.data$point_str)) +
xlab("") + geom_text(hjust=0.75, vjust=2, aes(fontface=2)) + geom_path() +
geom_point(
data = grobj_point,
mapping = aes(
x = .data$nodes2xposi,
y = .data$energy,
color = .data$energy,
shape = .data$type,
),size = 4
) +
scale_color_viridis_c(option = "plasma",alpha=0.8,
direction=-1,na.value = "grey") +
coord_cartesian(xlim=c(0.75,7.5), ylim=c(-5.8,-11.5))
plot(g)showIntrGraph() visualizes the pairwise interaction structure inferred by the fitted model in a 2D “interaction space”.
It extracts estimated parameters from the resulting interaction matrix (jest).
th are classified as positive, while links with value < -th are classified as negative.The plot draws these two link types separately and colors them differently (positive = red, negative = blue).
showIntrGraph(
elap[[1]],
sa,
th = 0.01,
annot_adj = c(0.75, 1.00)
)The stability structure of a community is not fixed, but changes continuously along an environmental gradient (e.g., temperature, nutrient availability, disturbance intensity).
GradELA() is designed to capture this dependence by estimating energy landscapes repeatedly across a controlled range of an environmental factor and summarizing how stable states and transition barriers change.
When environmental variables (envecs) are included, ELAplus supports Gradient Energy Landscape Analysis via GradELA(). The parameter fitting process is same as that for normal ELA except for specifying enmat.
bpresult <- Findbp(ocvecs, enmat = envecs, rep = 2, threads = 2,
cvmode="10fold",nrepeat = 10, we=c(0.001),totalit=4000,
lmd=c(1e-05,1e-03,0.01),maxlrs = c(0.005),
tol=0.03, runadamW=TRUE,sparse=TRUE,fastfitting=TRUE)
bp <- bpresult[[1]]
bpm <- as.numeric(unlist(strsplit(names(bp)[1], split = "_")))
sa <- runSA(
ocvecs,
enmat = envecs,
rep = 16,
getall = FALSE,
lambda = bpm[[1]],
we = bpm[[2]],
maxlr = bpm[[3]],
totalit = bpm[[4]]
)GradELA() proceeds in three conceptual stages:
eid is varied over a specified range and steps.
eid is environmental factor for changing (one column in the environmental matrix enmat).range is the minimum and maximum values of the environmental factor.steps is how many discrete points are evaluated.ELA() and ELPruning() are performed.env (it corresponds to a row in enmat) as follows:
env is provided, it is used as the reference vector for all environmental factors except eid.env is not provided, the mean vector of enmat is used automatically.# This process takes few minutes with a small number of threads, and here pre-computed results are shown.
gela <- GradELA(
sa = sa,
eid = "factor.1",
enmat = envecs,
env = NULL,
range = c(-1, 1),
steps = 32,
th = 0.05,
threads = 8,
pruning_type = "relative",
bs_params = bs_params,
ocmat = ocvecs)Users can use bootstrap samples in ELPruning process by pruning_type="bootstrap". Other options are equivalent to ELPruning().
# This process takes few minutes with a small number of threads, and here pre-computed results are shown.
bs_params <- bootstrap_SA(ocvecs, enmat = envecs,rep = 16, threads=8,bootstrap.it = 1000,
lambda = bpm[[1]],we = bpm[[2]],
maxlr = bpm[[3]], totalit = bpm[[4]])
gela <- GradELA(
sa = sa,
eid = "factor.1",
enmat = envecs,
env = NULL,
range = c(-1, 1),
steps = 32,
lower_qr = 0.25,
threads = 8,
pruning_type = "bootstrap",
bs_params = bs_params,
ocmat = ocvecs)GradELA() returns a list with three elements:
1. Pruned ELA results for each step
ELA() output, after pruning).2. Environmental value sequence
eid).3. Reference environmental vector
gela[[1]][[1]][[1]][[1]] # stable state IDs at 1st environmental parameter
# [1] "0Ht" "EWB" "1uV"
gela[[1]][[20]][[1]][[1]] # stable state IDs at 20th environmental parameter
# [1] "09x" "EWB"The resulting object can then be plotted using showSSD(), which visualize the regime shift: how the number, identity, and energy of stable states change along the environmental gradient.
# GradELA result (ELPruning with type="relative").
showSSD(gela)# GradELA result (ELPruning with type="bootstrap").
showSSD(gela)If users want to modify the figure configuration of the diagram, showSSD_ggplot() function returns a data frame for plotting. Then users can the configuration manually using ggplot().
ssd_df <- showSSD_ggplot(gela,plot = FALSE,getSSDobj = TRUE)
g <- ggplot(ssd_df, aes(x = env, y = Energy, color = SS_ID)) +
geom_point(shape = 19) +
geom_line(aes(group = SS_ID))showGELA3D() visualizes a smoothed energy landscape surface produced by GELAObj() and overlays stable-state trajectories.
GradELA().# This process takes few minutes, and here pre-computed result is drawn.
gelaobj <- GELAObj(gela, sa=sa,threads=2)
showGELA3D(gelaobj,theta = 30,phi = 30)Overview of extended maximum pairwise entropy model
The model determines the probability of the occurrence of community composition \(\sigma^{(k)}\) in an environmental condition \(\varepsilon = (\varepsilon_{1}, \varepsilon_{2}, ..., \varepsilon_{M})\), environmental factors such as pH or temperature.
\[ E(\sigma^{(k)},\varepsilon) = - \sum_{i=1}^{S} h_i \sigma_i^{(k)} - \sum_{j=1}^{S} \sum_{i=1}^{M} g_{ij} \varepsilon_i^{(k)} \sigma_j^{(k)} - \sum_{i=1}^{S} \sum_{j=1,i \neq j}^{S} J_{ij} \sigma_i^{(k)} \sigma_j^{(k)} \] where: