Install the released version from CRAN:
install.packages("CGMissingDataR")Or install the development version from GitHub:
install.packages("devtools")
devtools::install_github("ZhangLabUKY/CGMmissingDataR")CGMmissingDataR imputes missing glucose values in continuous glucose monitoring (CGM) data. The main public workflow is:
run_missing_glucose_imputation()The function handles both explicit missing glucose values coded as
NA and implicit missing readings caused by timestamp gaps.
It accepts a data frame with a subject identifier, timestamp column,
glucose column, and optional subject-level or visit-level covariates. It
returns the user’s original columns plus
imputed_glucose_value, leaving the original glucose column
unchanged.
run_missing_glucose_imputation() performs the following
steps:
interval_minutes
timestamp grid;target_col = NA;SEX when present;models:
models = NULL or models = "auto" keeps the
missing-rate rule,MICE+ARIMA when missing rate is
<= 0.05,MICE+XGBoost when missing rate is
> 0.05,MICE+ARIMA,
MICE+XGBoost, MICE+RF, MICE+kNN,
or MICE+LightGBM;imputed_glucose_value.Internal columns such as TimeSeries,
TimeDifferenceMinutes, lag features, rolling means,
imputation method labels, and missingness-tracking flags are used for
modeling but are not returned.
Because timestamp gaps are converted into explicit rows before imputation, the returned data frame may contain more rows than the input data when readings are absent from the expected CGM sampling grid.
The default R-native backend uses the R package mice.
For closest agreement with the Python reference workflow, install
reticulate and use the optional Python backend.
Real-imputation model engines run with n_threads = 1 by
default so examples, tests, and shared systems use conservative CPU
resources. Increase n_threads for faster local XGBoost,
Random Forest, or LightGBM runs.
install.packages("reticulate")The Python backend uses these Python packages through
reticulate:
reticulate::py_require(c(
"numpy",
"pandas",
"scikit-learn",
"statsmodels",
"xgboost"
))
# Optional, only needed for models = "lightgbm"
reticulate::py_install("lightgbm", pip = TRUE)library(CGMissingDataR)
data("CGMExmplDat10Pct")
out <- run_missing_glucose_imputation(
CGMExmplDat10Pct,
target_col = "LBORRES",
feature_cols = c("AGE", "hba1c"),
id_col = "USUBJID",
time_col = "Time",
imputer_backend = "mice"
)
head(out[c(
"USUBJID",
"Time",
"LBORRES",
"AGE",
"hba1c",
"imputed_glucose_value"
)])The original target column is not overwritten. Rows that were missing
in LBORRES, including rows inserted from timestamp gaps,
remain missing there; the completed value is stored in
imputed_glucose_value.
missing_rows <- is.na(out$LBORRES)
head(out[missing_rows, c(
"USUBJID",
"Time",
"LBORRES",
"imputed_glucose_value"
)])imputed_glucose_value is returned as a continuous
numeric model estimate. Users who need whole-number glucose values for
reporting can round after imputation:
out$imputed_glucose_value_rounded <- round(out$imputed_glucose_value)Raw CGM exports may represent missingness in two ways:
NA;For example, if a subject’s readings jump from 00:05 to
00:30, the function internally creates the missing 5-minute
rows at 00:10, 00:15, 00:20, and
00:25, sets the target glucose value to NA,
and then imputes those values using the same workflow as explicit
missing glucose values.
CGMmissingDataR also includes a small Shiny app for users who prefer
an interactive workflow. The app lets users upload a CSV file or load
one of the built-in example data sets, choose the target glucose,
subject ID, timestamp, and feature columns, run
run_missing_glucose_imputation(), preview rows with missing
glucose values that were imputed, and download the completed data as a
CSV file.
The app also exposes the same final-method selector as the R
function. Users can keep the automatic missing-rate rule or force
MICE+ARIMA, MICE+XGBoost,
MICE+RF, MICE+kNN, or
MICE+LightGBM; method-specific controls appear only when
they apply.
Launch the app from R with:
run_app()The app supports the same two imputation backends as the main function:
mice, the default CRAN-safe R backend;sklearn, the optional Python-compatible backend using
reticulate.The Shiny app is optional. If it is not already installed, install Shiny with:
install.packages("shiny")For package developers, the app is stored under
inst/shiny/cgm_imputation_app/ and is launched through the
exported run_app() helper.
Use imputer_backend = "sklearn" to run the strict
Python-compatible path. In that path, reticulate sends the
data to Python, where pandas, scikit-learn, statsmodels, Python xgboost,
and optional Python lightgbm perform the preprocessing and calculations.
The completed pandas data frame is then converted back to R.
out_py <- run_missing_glucose_imputation(
CGMExmplDat10Pct,
target_col = "LBORRES",
feature_cols = c("AGE", "hba1c"),
id_col = "USUBJID",
time_col = "Time",
imputer_backend = "sklearn"
)The Python backend is optional. It is not required for package installation, loading, or CRAN examples.
The main vignette contains a detailed walkthrough of data requirements, timestamp regularization, return columns, backend selection, optional Python setup, and troubleshooting:
https://zhanglabuky.github.io/CGMmissingDataR/articles/How-To-Use-CGMissingDataR.html
A separate Shiny app vignette walks through the interactive interface:
https://zhanglabuky.github.io/CGMmissingDataR/articles/Using-the-CGMissingDataR-Shiny-App.html
The changelog is available at:
https://zhanglabuky.github.io/CGMmissingDataR/news/index.html