DICErClust Illustrated Example: Heart Failure Risk Stratification

Sarah Ayton and Yiye Zhang

2026-05-21

Overview

This vignette walks through a complete DICErClust workflow on the UCI Heart Failure Clinical Records dataset (Chicco & Jurman 2020). By the end you will have:

Approximate run time on a single CPU core: 15–20 minutes (the training step is the bottleneck; all other steps are fast).


1. Load DICErClust

DICErClust is distributed as a source tarball. Install it once with install.packages(), then load it like any other package.

## Install from local tarball (run once):
# install.packages(
#   "/path/to/DICErClust_0.1.1.tar.gz",
#   repos = NULL, type = "source"
# )
library(DICErClust)
library(ggplot2)
library(pROC)

2. Download the UCI Heart Failure dataset

Chicco & Jurman (2020) collected 299 heart-failure patients from the Faisalabad Institute of Cardiology in Pakistan. The dataset contains 12 clinical features and a binary outcome DEATH_EVENT (1 = died during follow-up; 0 = survived). It is freely available from the UCI Machine Learning Repository (dataset #519).

hf_url  <- paste0(
  "https://archive.ics.uci.edu/ml/",
  "machine-learning-databases/00519/",
  "heart_failure_clinical_records_dataset.csv"
)
hf_dest <- tempfile(fileext = ".csv")
download.file(hf_url, hf_dest, quiet = TRUE)
hf <- read.csv(hf_dest)

cat(sprintf("Rows: %d   Columns: %d\n", nrow(hf), ncol(hf)))
print(table(DEATH_EVENT = hf$DEATH_EVENT))

The dataset is small but class-imbalanced: roughly 68% of patients survived (DEATH_EVENT = 0) and 32% died (DEATH_EVENT = 1).


3. Feature engineering

DICEr expects the input to be split into two feature matrices:

Matrix Contents Role
data_x Continuous laboratory / physiological measurements Encoder input: the LSTM compresses these into a low-dimensional latent representation used for clustering
data_v Binary demographic / comorbidity indicators Auxiliary outcome-head input: the likelihood-ratio test assesses whether the cluster explains outcome above and beyond these covariates

This design mirrors real-world EHR practice: data_x captures time-varying lab values while data_v captures static patient characteristics.

## Continuous lab features → LSTM encoder (data_x)
x_cols <- c("age", "creatinine_phosphokinase", "ejection_fraction",
            "platelets", "serum_creatinine", "serum_sodium", "time")

## Binary demographic indicators → outcome head (data_v)
v_cols <- c("anaemia", "diabetes", "high_blood_pressure", "sex", "smoking")

## Min-max scale continuous features to [0, 1].
## Scaling prevents any single lab value from dominating the MSE
## reconstruction loss relative to others.
scale_01 <- function(x) {
  r <- range(x, na.rm = TRUE)
  if (diff(r) == 0) return(x * 0)
  (x - r[1]) / diff(r)
}

X_x <- apply(as.matrix(hf[, x_cols]), 2, scale_01)  # 299 × 7, numeric
X_v <- apply(as.matrix(hf[, v_cols]), 2, as.numeric) # 299 × 5, binary as float

## Note: data_v *must* be stored as numeric (float), not integer.
## torch_tensor() infers dtype from R storage mode; integer columns produce
## int64 tensors that are incompatible with the float32 model weights.

cat(sprintf("data_x: %d × %d\ndata_v: %d × %d\n",
            nrow(X_x), ncol(X_x), nrow(X_v), ncol(X_v)))

n_x <- ncol(X_x)  # 7  continuous predictors
n_v <- ncol(X_v)  # 5  binary demographics
outcome <- hf$DEATH_EVENT

4. Stratified train / test split

A 70 / 30 stratified split preserves the ~32% event rate in both partitions, which is important given the small sample size.

set.seed(1111)
idx_death <- which(outcome == 1)
idx_alive <- which(outcome == 0)

train_idx <- sort(c(
  sample(idx_death, floor(0.70 * length(idx_death))),
  sample(idx_alive, floor(0.70 * length(idx_alive)))
))
test_idx <- setdiff(seq_len(nrow(hf)), train_idx)

cat(sprintf("Train: %d patients  (deaths: %d, %.0f%%)\n",
            length(train_idx), sum(outcome[train_idx]),
            100 * mean(outcome[train_idx])))
cat(sprintf("Test : %d patients  (deaths: %d, %.0f%%)\n",
            length(test_idx),  sum(outcome[test_idx]),
            100 * mean(outcome[test_idx])))

5. Serialise data in DICErClust format

DICEr() reads training and test sets from RDS files. Each file must contain a length-3 list:

  1. data_x — numeric matrix, shape n × p
  2. data_v — numeric matrix, shape n × q
  3. data_y — integer vector of 0/1 outcomes, length n
data_dir <- file.path(tempdir(), "dice_hf")
dir.create(data_dir, showWarnings = FALSE)

saveRDS(
  list(X_x[train_idx, ], X_v[train_idx, ], as.integer(outcome[train_idx])),
  file.path(data_dir, "hf_train.rds")
)
saveRDS(
  list(X_x[test_idx, ], X_v[test_idx, ], as.integer(outcome[test_idx])),
  file.path(data_dir, "hf_test.rds")
)

6. Configure DICEr

The argument list controls both the architecture and the training schedule.

args <- list(
  seed              = 1111,          # reproducibility seed
  input_path        = data_dir,      # directory containing RDS files
  filename_train    = "hf_train.rds",
  filename_test     = "hf_test.rds",

  ## ── Architecture ──────────────────────────────────────────
  n_input_fea       = n_x,          # 7 continuous LSTM input features
  n_hidden_fea      = 4,            # LSTM latent dimension (7 → 4)
  lstm_layer        = 1,            # single LSTM layer
  lstm_dropout      = 0.0,          # no dropout (small dataset)
  K_clusters        = 2,            # binary risk partition: high vs. low

  ## ── Auxiliary features ────────────────────────────────────
  n_dummy_demov_fea = n_v,          # 5 binary demographic covariates

  ## ── Hardware ──────────────────────────────────────────────
  cuda              = FALSE,        # set TRUE for GPU acceleration

  ## ── Optimiser ─────────────────────────────────────────────
  lr                = 1e-4,         # Adam learning rate

  ## ── Training schedule ─────────────────────────────────────
  init_AE_epoch     = 5,            # Stage 1: autoencoder warm-up epochs
  iter              = 30,           # Stage 2: number of clustering iterations
  epoch_in_iter     = 2,            # gradient-update epochs per iteration

  ## ── Loss weights ──────────────────────────────────────────
  ## Combined loss: L = λ_AE·L_AE + λ_clf·L_classifier
  ##                  + λ_out·L_outcome + λ_p·L_p_value
  ## L_p_value = 3.841 − G penalises non-significant cluster configurations
  ## (G is the LRT statistic; 3.841 is the χ²(1) critical value at α = 0.05)
  lambda_AE         = 1.0,
  lambda_classifier = 1.0,
  lambda_outcome    = 1.0,
  lambda_p_value    = 1.0
)

Training stages explained

DICEr runs three sequential stages:

  1. LSTM autoencoder warm-up (init_AE_epoch epochs): trains the encoder and decoder to reconstruct data_x, establishing a compact latent representation before any clustering begins.

  2. Joint optimisation (iter iterations × epoch_in_iter epochs each): alternates between

    1. k-means clustering in the LSTM latent space, and
    2. gradient updates minimising the combined four-component loss. After each k-means step the cluster with the highest data_y = 1 rate is relabelled cluster 0 (the high-risk cluster).
  3. Model selection: saves the checkpoint with the lowest test negative log-likelihood that also satisfies the likelihood-ratio test p < 0.05 between at least one cluster pair.


7. Train the model

## DICEr writes output files relative to the working directory.
## We temporarily switch to tempdir() to keep them self-contained.
old_wd <- setwd(tempdir())
suppressWarnings(DICEr(args))
setwd(old_wd)

Output is written to hn_4_K_2/part2_AE_nhidden_4/ relative to the working directory used during training. The key files are:


8. Load the best checkpoint

part2_dir <- file.path(tempdir(), "hn_4_K_2", "part2_AE_nhidden_4")

if (!file.exists(file.path(part2_dir, "data_train_iter.rds"))) {
  stop(
    "No checkpoint found — the p < 0.05 criterion was not met in ",
    args$iter, " iterations.  Increase args$iter and rerun."
  )
}

res_train <- readRDS(file.path(part2_dir, "data_train_iter.rds"))
res_test  <- readRDS(file.path(part2_dir, "data_test_iter.rds"))

Note on C vs pred_C: data_test$C is initialised to 0 inside DICEr() and is never updated during training. Always use data_test$pred_C (nearest-centroid assignments) for test-set evaluation.


9. Evaluate cluster quality

The code below uses pre-computed results from the reference run (seed = 1111, 30 iterations) so the vignette builds without retraining. When you run DICEr() yourself the results will be loaded from the checkpoint you just produced.

Dynamic cluster labelling

k-means spatial boundaries may assign a different cluster index to the high-mortality group in training versus test, so labels should be assigned within each split based on the observed death rate.

label_by_rate <- function(df) {
  rates <- tapply(df$death, df$cluster, mean)
  hi    <- as.integer(names(which.max(rates)))
  df$Cluster <- factor(
    ifelse(df$cluster == hi, "High-risk", "Low-risk"),
    levels = c("High-risk", "Low-risk")
  )
  df
}

train_df <- label_by_rate(train_df)
test_df  <- label_by_rate(test_df)

Cluster outcome summary

summarise_clusters <- function(df, split_name) {
  do.call(rbind, lapply(split(df, df$Cluster), function(d) {
    data.frame(
      Split     = split_name,
      Cluster   = as.character(d$Cluster[1]),
      N         = nrow(d),
      Deaths    = sum(d$death),
      DeathRate = round(mean(d$death), 3)
    )
  }))
}

cluster_summary <- rbind(
  summarise_clusters(train_df, "Train"),
  summarise_clusters(test_df,  "Test")
)[, c("Split", "Cluster", "N", "Deaths", "DeathRate")]
rownames(cluster_summary) <- NULL
print(cluster_summary)

AUC

test_score <- as.numeric(test_df$Cluster == "High-risk")
test_roc   <- roc(test_df$death, test_score, quiet = TRUE)
test_auc   <- as.numeric(auc(test_roc))
cat(sprintf("Test AUC: %.4f\n", test_auc))

Chi-squared test

ct        <- table(Cluster = test_df$Cluster, Death = test_df$death)
chisq_res <- suppressWarnings(chisq.test(ct))
print(ct)
cat(sprintf("Chi-squared = %.3f, df = %d, p %s\n",
            chisq_res$statistic,
            chisq_res$parameter,
            ifelse(chisq_res$p.value < 0.001, "< 0.001",
                   sprintf("= %.4f", chisq_res$p.value))))

An AUC of 0.823 and a chi-squared p-value < 0.001 indicate that DICEr has identified two clusters with a strong and statistically significant difference in mortality rate: 71.9% in the High-risk cluster versus 10.3% in the Low-risk cluster.


10. Figures

Figure 1 — Cluster outcome bar chart

te_sum <- summarise_clusters(test_df, "Test")

ggplot(te_sum, aes(x = Cluster, y = DeathRate, fill = Cluster)) +
  geom_col(width = 0.5, colour = "black", linewidth = 0.4) +
  geom_text(aes(label = paste0(Deaths, "/", N)),
            vjust = -0.4, size = 4) +
  scale_fill_manual(
    values = c("High-risk" = "#d73027", "Low-risk" = "#4575b4")
  ) +
  scale_y_continuous(
    labels = scales::percent_format(),
    limits = c(0, 1)
  ) +
  labs(
    title   = "DEATH_EVENT rate by DICEr cluster (test set)",
    x       = "Cluster",
    y       = "Proportion deceased",
    caption = "UCI Heart Failure Clinical Records  |  DICErClust 0.1.1"
  ) +
  theme_bw(base_size = 13) +
  theme(legend.position = "none")

Figure 2 — ROC curve

roc_df <- data.frame(
  FPR = 1 - test_roc$specificities,
  TPR = test_roc$sensitivities
)

ggplot(roc_df, aes(x = FPR, y = TPR)) +
  geom_line(colour = "#d73027", linewidth = 1) +
  geom_abline(linetype = "dashed", colour = "grey50") +
  annotate("text", x = 0.55, y = 0.15,
           label = sprintf("AUC = %.3f", test_auc),
           size = 5, colour = "#d73027") +
  labs(
    title   = "ROC curve — DICEr cluster vs. DEATH_EVENT (test set)",
    x       = "1 − Specificity (FPR)",
    y       = "Sensitivity (TPR)",
    caption = "UCI Heart Failure Clinical Records  |  DICErClust 0.1.1"
  ) +
  theme_bw(base_size = 13)

Summary of results

Metric Value
Best checkpoint iteration 19
LRT p-value at checkpoint 0.0100
Test negative log-likelihood 0.6493
Test AUC 0.823
High-risk mortality (test) 71.9% (23/32)
Low-risk mortality (test) 10.3% (6/58)
Chi-squared statistic 32.99
Chi-squared p-value < 0.001

References

Chicco D, Jurman G (2020). “Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone.” BMC Medical Informatics and Decision Making, 20, 16. doi:10.1186/s12911-020-1023-5

Dua D, Graff C (2019). UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences. https://archive.ics.uci.edu/ml/datasets/Heart+failure+clinical+records

Huang Y, Du C, Zhu F, et al. (2021). “Self-supervised deep clustering of patient subgroups for heart failure with preserved ejection fraction.” Journal of the American Medical Informatics Association, 28, 2394–2403. doi:10.1093/jamia/ocab203