How to use normref

Marieke E. Timmerman, Klazien de Vries, and Hannah M. Heister

2025-09-30

Introduction

The normref package provides tools to calculate continuous norms for psychological test scores, where the norms depend on a single continuous variable, denoted as the norm predictor. Typically, the norm predictor is age. The package allows for a single norm predictor. Applying the norms results in standardized scores such as percentiles, T-scores, or IQ scores. Such standardized scores allow for norm-referenced interpretation, that is, comparing an individual’s performance to that of a relevant reference population.

For example, an IQ score expresses an individual’s performance relative to the general population of the same age. To estimate such norms, normref requires raw scores from a representative normative sample.
This vignette illustrates how to derive norms from such a sample, following the approach described in a tutorial (Timmerman, Voncken, and Albers 2021).


Workflow overview

The main workflow in normref consists of:

  1. Shaping the data (shape_data)
  2. Fitting candidate models (fb_select)
  3. Inspecting model fit (BIC, worm plots, centile plots)
  4. Creating norm tables (normtable_create, plot_normtable)
  5. (Optional) Adding confidence intervals (CIs) for normed scores based on age-dependent reliability estimates

Advanced functionality includes norming of composite scores, norms for new data, user-defined distributions, and adding CIs based on age-dependent reliability estimates.


Workflow

Example: norming a single test

We illustrate the process using IDS data (Grob and Hagmann-von Arx 2018; Grob et al. 2018). An anonymized and perturbed version of this data is included in normref.
We focus on Test 14, with \(n=1660\) children and raw scores ranging from 2–34.

Three candidate distributions are considered:

To compare candidate models, we use the free order model selection procedure with the Bayesian Information Criterion (BIC) as selection criterion, which has been shown to be effective in model selection (Voncken, Albers, and Timmerman 2019).

library(normref)
library(gamlss.dist) # also needed for vignette

First, we load the data.

get(data("ids_data"))

Shape the data

Before fitting any distributions to model the raw scores of Test 14 as a function of age, we must ensure that the data are in the correct format for gamlss implementation. This is handled by the function shape_data.

The shape_data function verifies whether the response variable is properly formatted and, if necessary, reshapes it. For binomial-type distributions like the BB, gamlss requires the response to be a two-column matrix: one column for the number of correct items and one for the number of incorrect items. For each individual, these columns sum to the maximum possible raw score on the test.

By default, shape_data uses the highest observed raw score in the data as the maximum. This can be overridden with the argument max_score. The function also removes any cases with missing values for the raw scores or age.

Applying shape_data to our data produces a new dataframe, mydata_BB, which includes the additional column "shaped_score" containing the scores in the correct format for BB models:

mydata_BB <- shape_data(data = ids_data,
                     age_name = "age",
                     score_name = "y14",
                     family = "BB",
                     max_score = 34)
#> Binomial family: converted response to matrix with two columns:
#>   col1 = number correct, col2 = number incorrect.

We can also apply shape data to obtain scores in the correct format for the BCPE model, which requires nonzero, positive scores.

mydata_BCPE <- shape_data(data = ids_data,
                     age_name = "age",
                     score_name = "y14",
                     family = "BCPE")
#> Response values are valid and were not transformed.

Because Test 14 has no results with zero scores, the scores were not transformed. A new column "shaped_score" was created, but it is identical to "y14".

Fit candidate models

To fit the candidate models, we use the model selection function fb_select(), which implements the free order model selection procedure (Voncken, Albers, and Timmerman 2019). We first apply this procedure to the BB distribution. Note that we pass "shaped_score" as the score_name, because this column contains the reshaped response variable created by shape_data().

mod_BB_y14 <- fb_select(data = mydata_BB,
                       age_name = "age",
                       score_name = "shaped_score",
                       family = "BB",
                       selcrit = "BIC")
#> FB-select iteration 0: Polynomial degrees = 2,1 : BIC = 8469.301
#> FB-select iteration 1: Polynomial degrees = 2,2 : BIC = 8462.629

The free order model selection procedure selects a second degree polynomial of age for both the \(\mu\) and \(\sigma\) parameters of the BB model.

As alternative models, we also fit the BCPE and NO models.

mod_BCPE_y14 <- fb_select(data = mydata_BCPE,
                         age_name = "age",
                         score_name = "shaped_score",
                         family = "BCPE",
                         selcrit = "BIC") 
#> FB-select iteration 0: Polynomial degrees = 2,1,0,0 : BIC = 8404.609

mod_NO_y14 <- fb_select(data = ids_data,
                      age_name = "age",
                      score_name = "y14",
                      family = "NO",
                      selcrit = "BIC")
#> FB-select iteration 0: Polynomial degrees = 2,1 : BIC = 8456.394

Inspect model fit

After fitting the BB, BCPE, and NO models, we first compare their BIC values. Lower BIC values indicate better model fit.

mod_BB_y14$selcrit
#> [1] 8462.629
mod_BCPE_y14$selcrit
#> [1] 8404.609
mod_NO_y14$selcrit
#> [1] 8456.394

The results show that the BCPE model provides the best fit, as it has the lowest BIC. The NO model follows at some distance, while the BB model performs worst according to BIC.

To further evaluate model fit, we inspect worm plots (Buuren and Fredriks 2001). Worm plots display the relationship between empirical quantiles and model-implied quantiles.

For Test 14, the worm plot confirms that the BCPE model has the best fit: the worm is relatively flat, and nearly all values lie within the confidence bands. By contrast, the BB and NO models show some deviations from the expected pattern in the intervals [-4, -2] and [-4, 0] respectively.

gamlss::wp(mod_BCPE_y14, ylim.all = 2)

gamlss::wp(mod_BB_y14, ylim.all = 2) 

gamlss::wp(mod_NO_y14, ylim.all = 2) 

Because both BIC values and worm plots point to the BCPE as the superior model, we select this model for further inspection and prediction.

To fully evaluate the fit, worm plots should also be created per age interval, since overall diagnostics can hide local misfit across the age range.
This can be done with the arguments xvar (the age variable) and n.inter (the number of age intervals). For example, for the BCPE model:

gamlss::wp(mod_BCPE_y14, xvar = age, ylim.worm = 2, n.inter = 4)
#> number of missing points from plot= 0  out of  416
#> number of missing points from plot= 0  out of  416
#> number of missing points from plot= 0  out of  415
#> number of missing points from plot= 0  out of  415

Note that a sample size of at least 200 per worm plot is recommended for stable results (Buuren and Fredriks 2001).

Once the best-fitting model has been selected (BCPE in this case), we can visually inspect its fit using centile curves. These curves show the predicted centiles of test scores as a function of age, allowing us to see how scores are distributed across the age range.

gamlss::centiles(mod_BCPE_y14, cent = c(5,25,50,75,95))

#> % of cases below  5 centile is  5.481928 
#> % of cases below  25 centile is  25.12048 
#> % of cases below  50 centile is  50.42169 
#> % of cases below  75 centile is  75.06024 
#> % of cases below  95 centile is  95.36145

These centile plots complement worm plots and BIC comparisons by providing a direct, visual summary of the distribution of scores across ages, making it easier to assess model fit and to interpret age-dependent norms.

For most distributions, the centiles function in GAMLSS works well. However, for binomial-type distributions such as the BB distribution, the standard centiles function does not always render correctly. For these cases, centiles_bin() provides an alternative (e.g., centiles_bin(mod_BB_y14, xvar = age, cent = c(5,25,50,75,95))).

Create norm tables

After selecting the best-fitting model, we can generate norm tables using the function normtable_create(). By default, the function produces z-scores (i.e., \(\mu_{age} = 0, \sigma_{age} = 1\)), but other norm types can also be requested:

The function returns a list containing:

Optionally, an Excel file can be generated by specifying excel_name. The Excel file contains:

  1. Norm matrix tab: first column shows ages, first row shows raw scores, with the normed score at each age/score combination.
  2. Norm sample tab: the ages and associated normed scores of the observed sample.

For example, in the norm matrix:

In this vignette, we save results to a temporary Excel file (tempfile()) to comply with CRAN policies.
In real applications, users can specify a permanent path, e.g.
excel_name = "norms_y14_1.xlsx".

temp_file <- tempfile(fileext = ".xlsx")
norm_mod_BCPE_y14  <- normtable_create(model = mod_BCPE_y14,
                                      data = mydata_BCPE,
                                      age_name = "age",
                                      score_name = "shaped_score", 
                                      normtype = "IQ", 
                                      excel_name = temp_file,
                                      excel = TRUE)
#> Norm table written to: C:\Users\P157735\AppData\Local\Temp\RtmpmUIrgj\file243012aa5593.xlsx
head(norm_mod_BCPE_y14[["norm_matrix"]][, c(1, 11)],
     n = 15)
#>         age        11
#> 1  4.000000 129.11870
#> 2  4.181818 125.16161
#> 3  4.363636 121.54584
#> 4  4.545455 118.22558
#> 5  4.727273 115.16269
#> 6  4.909091 112.32525
#> 7  5.090909 109.68653
#> 8  5.272727 107.22416
#> 9  5.454545 104.91960
#> 10 5.636364 102.75783
#> 11 5.818182 100.72752
#> 12 6.000000  98.82354
#> 13 6.181818  97.04659
#> 14 6.363636  95.38812
#> 15 6.545455  93.83693

We can visualize the generated norm tables using the function plot_normtable(). This function plots the percentiles as a function of age for each raw score. Each line represents a specific raw score:

This visualization allows us to inspect how raw scores translate to normed scores across ages and to see whether the norms behave as expected.

plot_normtable(normtable = norm_mod_BCPE_y14)

Confidence intervals for normed scores

By default, normtable_create() returns point estimates of the norms. Optionally, you can also calculate confidence intervals (CIs) for the normed scores. The default confidence level is 0.95 (95% interval).

The CIs are estimated using an extension of the group model from classical test theory, adapted for age-dependent scores (Heister et al. 2024). To compute the upper and lower boundaries, you need a dataframe containing two vectors:

The resulting CIs are stored in the output tables with the suffix _lower and _upper (e.g., norm_sample_lower and norm_sample_upper).

For distributions without a closed-form mean and standard deviation, such as the BCPE distribution, computing the CIs can be slower because these quantities need to be estimated via resampling.

In the example below, we use the fictitious age-dependent reliabilities provided in ids_rel_data. Later, in the section Estimating test reliability dependent on age, we show how such reliabilities can be derived from individual item scores using a sliding window approach. Note that for Test 14, we cannot estimate these reliabilities from raw scores, as the individual items are not available.

get(data("ids_rel_data"))
temp_file <- tempfile(fileext = ".xlsx")
norm_mod_BCPE_y14  <- normtable_create(model = mod_BCPE_y14,
                                      data = mydata_BCPE,
                                      age_name = "age",
                                      score_name = "shaped_score",
                                      datarel = ids_rel_data,
                                      normtype = "IQ",
                                      excel_name = temp_file,
                                      excel = TRUE)
#> Norm table written to: C:\Users\P157735\AppData\Local\Temp\RtmpmUIrgj\file2430d633a2e.xlsx

Advanced features

Norming of composite scores

Composite scores are calculated by combining multiple subtest scores, typically as the sum of their standardized scores (z-scores). To illustrate this process, we first fit a BCPE model to Test 7. Based on the wormplot inspection (not shown), the model fits the data well. Note that fitting BCPE models can be computationally intensive.

mydata_BCPE_y7 <- shape_data(data = ids_data,
                     age_name = "age",
                     score_name = "y7",
                     family = "BCPE")
#> Response values are valid and were not transformed.
mod_BCPE_y7 <- fb_select(data = mydata_BCPE_y7,
                        age_name = "age",
                        score_name = "shaped_score",
                        family = "BCPE",
                        selcrit = "BIC")
#> FB-select iteration 0: Polynomial degrees = 2,1,0,0 : BIC = 9116.086
#> FB-select iteration 1: Polynomial degrees = 2,1,0,1 : BIC = 9096.959
#> FB-select iteration 2: Polynomial degrees = 3,1,0,1 : BIC = 9084.672
#> FB-select iteration 3: Polynomial degrees = 4,1,0,1 : BIC = 9076.861
#> FB-select iteration 4: Polynomial degrees = 4,1,1,1 : BIC = 9067.153
#> FB-select iteration 5: Polynomial degrees = 4,1,2,1 : BIC = 9063.883
norm_mod_BCPE_y7 <- normtable_create(model = mod_BCPE_y7,
                                    data = mydata_BCPE_y7,
                                    age_name = "age",
                                    score_name = "shaped_score")

Next, we calculate composite scores based on the norm tables of Test 7 and Test 14. The workflow is as follows:

  1. Create a list of the relevant subtest norm tables.
  2. Use the function composite_shape() to compute the composite scores as the sum of z-scores across the tests.
  3. Fit a normal distribution to the composite scores, since sums of normally distributed variables are normally distributed.
  4. Generate a norm table for the composite scores using normtable_create().
composite <- list(norm_mod_BCPE_y7,
                  norm_mod_BCPE_y14)

composite_data <- composite_shape(normtables = composite)

modNO_comp <- fb_select(
  data = composite_data,
  age_name = "age",
  score_name = "z_sum",
  family = "NO",
  selcrit = "BIC"
)
#> FB-select iteration 0: Polynomial degrees = 2,1 : BIC = 6630.911
#> FB-select iteration 1: Polynomial degrees = 2,0 : BIC = 6623.52
#> FB-select iteration 2: Polynomial degrees = 1,0 : BIC = 6618.417
#> FB-select iteration 3: Polynomial degrees = 0,0 : BIC = 6611.19

norm_modNO_comp <- normtable_create(modNO_comp,
                                   composite_data,
                                   age_name = "age",
                                   score_name = "z_sum",
                                   cont_cor = TRUE)

Norms for a new sample of individuals

It is possible to calculate normed scores directly for a new sample, using their raw scores and ages.

In this example, we select the raw scores and ages of the first five individuals in our dataset. We then use the previously estimated model to obtain their normed scores and display the results.

newdata <- ids_data[1:5,c("age","y14")]
norm_mod_BCPE_newdata <- normtable_create(model = mod_BCPE_y14,
                                         data = newdata,
                                         age_name = "age",
                                         score_name = "y14", 
                                         new_data = TRUE,
                                         datarel = ids_rel_data)
norm_mod_BCPE_newdata[["norm_sample"]]
#>           age          Z
#> [1,] 10.97146 -0.2890158
#> [2,] 11.67420  0.1106675
#> [3,] 11.44422  1.6213568
#> [4,] 10.04932 -0.8806376
#> [5,] 14.18207 -0.6811148

Norming with a truncated model

Free order model selection with the fb_select is also available for distributions that are defined by the user, for example truncated distributions generated with the gamlss.tr package. Using truncated distributions can improve model fit when continuous distributions are applied to scores that have natural bounds. By explicitly accounting for minimum and maximum possible values, truncation prevents unrealistic predictions outside the valid score range and often leads to more accurate norm estimates. We illustrate the use of a truncated NO distribution for the ids_kn_data data, which consists of 34 binary item scores. This example uses a NO distribution for demonstration purposes only. In practice, users should explore several candidate distributions to select the one that best fits their data.

get(data("ids_kn_data"))

To fit a truncated NO model, we first create a new distribution using the gen.trun() function. We specify truncation at both ends of the distribution (type="both") with a minimum score of 0 and a maximum score of 34 (par=c(0,34)). We apply this to the NO family (family="NO") and name the new distribution tr34, resulting in a distribution called NOtr34.

Next, we shape the data appropriately for the truncated distribution and fit the model.


gamlss.tr::gen.trun(par=c(0,34), family="NO", name="tr34", type="both")
#> A truncated family of distributions from NO has been generated 
#>  and saved under the names:  
#>  dNOtr34 pNOtr34 qNOtr34 rNOtr34 NOtr34 
#> The type of truncation is both 
#>  and the truncation parameter is 0 34

mydata_NOtr34 <- shape_data(data = ids_kn_data,
                     age_name = "age_years",
                     score_name = "rawscore",
                     family = "NOtr34")
#> Response values are valid and were not transformed.
mod_NOtr34_KN <- fb_select(data = mydata_NOtr34,
                        age_name = "age_years",
                        score_name = "shaped_score",
                        family = "NOtr34")
#> FB-select iteration 0: Polynomial degrees = 2,1 : BIC = 8919.175
#> FB-select iteration 1: Polynomial degrees = 3,1 : BIC = 8896.664

Estimating test reliability dependent on age

To calculate confidence intervals for the normed scores, we first estimate the reliability of the test as a function of age using a sliding window approach. This approach is implemented in the function reliability_window(), which computes Cronbach’s alpha within age-specific groups (Heister et al. 2024). We again use the ids_kn_data dataset.

To estimate the reliability of the test, information at the item-level is needed. First, we identify the columns in the dataset corresponding to the items.

item_variables_kn <- which(substr(colnames(ids_kn_data),1,2) == "KN")

To choose appropriate parameters for estimating age-dependent reliabilities, we first explore a range of window widths and step sizes. This helps ensure that the estimated reliabilities are stable and vary smoothly with age. The function different_rel() allows us to compare the reliability estimates across different settings:

rel_int <- different_rel(data = ids_kn_data, 
                         item_variables = item_variables_kn, 
                         step_window = c(0.5, 1 , 2, 5, 10, 20), 
                         age_name = "age_years",
                         min_agegroup = 5,
                         max_agegroup = 20,
                         step_agegroup = c(0.5,1,1.5,2))

We then visualize the reliability estimates across age for the different window widths and step sizes to guide our selection. Ideally, the step size should be small enough to capture smooth changes in reliability, and the window width should be wide enough to assume approximately equal true scores within the window.

plot_drel(rel_int, ncol = 2)

For this example, we select a step size of 2 and a window width of 2.

rel_est <- reliability_window(data = ids_kn_data,
                   age_name = "age_years",
                   item_variables = item_variables_kn,
                   window_width = 2)

When we estimate the normed scores using the truncated BCPE model, we can now construct reliability based CIs for the normed scores.

norm_mod_BCPEtr34_KN  <- normtable_create(model = mod_NOtr34_KN,
                                     data = mydata_NOtr34,
                                     age_name = "age_years",
                                     score_name = "shaped_score",
                                     datarel = rel_est,
                                     normtype = "IQ")

References

Buuren, Stef van, and Miranda Fredriks. 2001. “Worm Plot: A Simple Diagnostic Device for Modelling Growth Reference Curves.” Statistics in Medicine 20 (8): 1259–77.
Grob, Alexander, and Priska Hagmann-von Arx. 2018. IDS 2: Intelligence and Development Scales-2. Hogrefe.
Grob, Alexander, Priska Hagmann-von Arx, Selma Ruiter, Marieke Timmerman, and Linda Visser. 2018. IDS-2: Intelligentie-En Ontwikkelingsschalen Voor Kinderen En Jongeren. Hogrefe Publishing.
Heister, Hannah M, Casper J Albers, Marie Wiberg, and Marieke E Timmerman. 2024. “Item Response Theory-Based Continuous Test Norming.” Psychological Methods. https://doi.org/10.1037/met0000686.
Timmerman, Marieke E, Lieke Voncken, and Casper J Albers. 2021. “A Tutorial on Regression-Based Norming of Psychological Tests with GAMLSS.” Psychological Methods 26 (3): 357. https://doi.org/10.1037/met0000348.
Voncken, Lieke, Casper J Albers, and Marieke E Timmerman. 2019. “Model Selection in Continuous Test Norming with GAMLSS.” Assessment 26 (7): 1329–46. https://doi.org/10.1177/1073191117715113.