The normref
package provides tools to calculate
continuous norms for psychological test scores, where the norms depend
on a single continuous variable, denoted as the norm predictor.
Typically, the norm predictor is age. The package allows for a single
norm predictor. Applying the norms results in standardized scores such
as percentiles, T-scores, or IQ scores. Such standardized scores allow
for norm-referenced interpretation, that is, comparing an individual’s
performance to that of a relevant reference population.
For example, an IQ score expresses an individual’s performance
relative to the general population of the same age. To estimate such
norms, normref
requires raw scores from a representative
normative sample.
This vignette illustrates how to derive norms from such a sample,
following the approach described in a tutorial (Timmerman, Voncken, and Albers 2021).
The main workflow in normref
consists of:
shape_data
)fb_select
)normtable_create
,
plot_normtable
)Advanced functionality includes norming of composite scores, norms for new data, user-defined distributions, and adding CIs based on age-dependent reliability estimates.
We illustrate the process using IDS data (Grob
and Hagmann-von Arx 2018; Grob et al. 2018). An anonymized and
perturbed version of this data is included in
normref
.
We focus on Test 14, with \(n=1660\)
children and raw scores ranging from 2–34.
Three candidate distributions are considered:
To compare candidate models, we use the free order model selection procedure with the Bayesian Information Criterion (BIC) as selection criterion, which has been shown to be effective in model selection (Voncken, Albers, and Timmerman 2019).
First, we load the data.
Before fitting any distributions to model the raw scores of Test 14
as a function of age, we must ensure that the data are in the correct
format for gamlss implementation. This is handled by the function
shape_data
.
The shape_data
function verifies whether the response
variable is properly formatted and, if necessary, reshapes it. For
binomial-type distributions like the BB, gamlss requires the response to
be a two-column matrix: one column for the number of correct items and
one for the number of incorrect items. For each individual, these
columns sum to the maximum possible raw score on the test.
By default, shape_data
uses the highest observed raw
score in the data as the maximum. This can be overridden with the
argument max_score. The function also removes any cases with missing
values for the raw scores or age.
Applying shape_data
to our data produces a new
dataframe, mydata_BB
, which includes the additional column
"shaped_score"
containing the scores in the correct format
for BB models:
mydata_BB <- shape_data(data = ids_data,
age_name = "age",
score_name = "y14",
family = "BB",
max_score = 34)
#> Binomial family: converted response to matrix with two columns:
#> col1 = number correct, col2 = number incorrect.
We can also apply shape data to obtain scores in the correct format for the BCPE model, which requires nonzero, positive scores.
mydata_BCPE <- shape_data(data = ids_data,
age_name = "age",
score_name = "y14",
family = "BCPE")
#> Response values are valid and were not transformed.
Because Test 14 has no results with zero scores, the scores were not
transformed. A new column "shaped_score"
was created, but
it is identical to "y14"
.
To fit the candidate models, we use the model selection function
fb_select()
, which implements the free order model
selection procedure (Voncken, Albers, and
Timmerman 2019). We first apply this procedure to the BB
distribution. Note that we pass "shaped_score"
as the
score_name
, because this column contains the reshaped
response variable created by shape_data()
.
mod_BB_y14 <- fb_select(data = mydata_BB,
age_name = "age",
score_name = "shaped_score",
family = "BB",
selcrit = "BIC")
#> FB-select iteration 0: Polynomial degrees = 2,1 : BIC = 8469.301
#> FB-select iteration 1: Polynomial degrees = 2,2 : BIC = 8462.629
The free order model selection procedure selects a second degree polynomial of age for both the \(\mu\) and \(\sigma\) parameters of the BB model.
As alternative models, we also fit the BCPE and NO models.
mod_BCPE_y14 <- fb_select(data = mydata_BCPE,
age_name = "age",
score_name = "shaped_score",
family = "BCPE",
selcrit = "BIC")
#> FB-select iteration 0: Polynomial degrees = 2,1,0,0 : BIC = 8404.609
mod_NO_y14 <- fb_select(data = ids_data,
age_name = "age",
score_name = "y14",
family = "NO",
selcrit = "BIC")
#> FB-select iteration 0: Polynomial degrees = 2,1 : BIC = 8456.394
After fitting the BB, BCPE, and NO models, we first compare their BIC values. Lower BIC values indicate better model fit.
mod_BB_y14$selcrit
#> [1] 8462.629
mod_BCPE_y14$selcrit
#> [1] 8404.609
mod_NO_y14$selcrit
#> [1] 8456.394
The results show that the BCPE model provides the best fit, as it has the lowest BIC. The NO model follows at some distance, while the BB model performs worst according to BIC.
To further evaluate model fit, we inspect worm plots (Buuren and Fredriks 2001). Worm plots display the relationship between empirical quantiles and model-implied quantiles.
Ideally, the points (the “worm”) are close to the red horizontal zero line,
and most values fall inside the funnel-shaped confidence bands (dashed lines).
For Test 14, the worm plot confirms that the BCPE model has the best fit: the worm is relatively flat, and nearly all values lie within the confidence bands. By contrast, the BB and NO models show some deviations from the expected pattern in the intervals [-4, -2] and [-4, 0] respectively.
Because both BIC values and worm plots point to the BCPE as the superior model, we select this model for further inspection and prediction.
To fully evaluate the fit, worm plots should also be created per age
interval, since overall diagnostics can hide local misfit across the age
range.
This can be done with the arguments xvar
(the age variable)
and n.inter
(the number of age intervals). For example, for
the BCPE model:
gamlss::wp(mod_BCPE_y14, xvar = age, ylim.worm = 2, n.inter = 4)
#> number of missing points from plot= 0 out of 416
#> number of missing points from plot= 0 out of 416
#> number of missing points from plot= 0 out of 415
#> number of missing points from plot= 0 out of 415
Note that a sample size of at least 200 per worm plot is recommended for stable results (Buuren and Fredriks 2001).
Once the best-fitting model has been selected (BCPE in this case), we can visually inspect its fit using centile curves. These curves show the predicted centiles of test scores as a function of age, allowing us to see how scores are distributed across the age range.
#> % of cases below 5 centile is 5.481928
#> % of cases below 25 centile is 25.12048
#> % of cases below 50 centile is 50.42169
#> % of cases below 75 centile is 75.06024
#> % of cases below 95 centile is 95.36145
These centile plots complement worm plots and BIC comparisons by providing a direct, visual summary of the distribution of scores across ages, making it easier to assess model fit and to interpret age-dependent norms.
For most distributions, the centiles function in GAMLSS works well.
However, for binomial-type distributions such as the BB distribution,
the standard centiles function does not always render correctly. For
these cases, centiles_bin()
provides an alternative (e.g.,
centiles_bin(mod_BB_y14, xvar = age, cent = c(5,25,50,75,95))
).
After selecting the best-fitting model, we can generate norm tables
using the function normtable_create()
. By default, the
function produces z-scores (i.e., \(\mu_{age}
= 0, \sigma_{age} = 1\)), but other norm types can also be
requested:
The function returns a list containing:
cdf_matrix
: the percentiles for each raw score and
agenorm_matrix
: the corresponding normed scores in the
requested norm typenorm_sample
and cdf_sample
: the percentile
and normed scores for the observed sampleOptionally, an Excel file can be generated by specifying
excel_name
. The Excel file contains:
For example, in the norm matrix:
In this vignette, we save results to a temporary Excel file
(tempfile()
) to comply with CRAN policies.
In real applications, users can specify a permanent path, e.g.
excel_name = "norms_y14_1.xlsx"
.
temp_file <- tempfile(fileext = ".xlsx")
norm_mod_BCPE_y14 <- normtable_create(model = mod_BCPE_y14,
data = mydata_BCPE,
age_name = "age",
score_name = "shaped_score",
normtype = "IQ",
excel_name = temp_file,
excel = TRUE)
#> Norm table written to: C:\Users\P157735\AppData\Local\Temp\RtmpmUIrgj\file243012aa5593.xlsx
head(norm_mod_BCPE_y14[["norm_matrix"]][, c(1, 11)],
n = 15)
#> age 11
#> 1 4.000000 129.11870
#> 2 4.181818 125.16161
#> 3 4.363636 121.54584
#> 4 4.545455 118.22558
#> 5 4.727273 115.16269
#> 6 4.909091 112.32525
#> 7 5.090909 109.68653
#> 8 5.272727 107.22416
#> 9 5.454545 104.91960
#> 10 5.636364 102.75783
#> 11 5.818182 100.72752
#> 12 6.000000 98.82354
#> 13 6.181818 97.04659
#> 14 6.363636 95.38812
#> 15 6.545455 93.83693
We can visualize the generated norm tables using the function
plot_normtable()
. This function plots the percentiles as a
function of age for each raw score. Each line represents a specific raw
score:
This visualization allows us to inspect how raw scores translate to normed scores across ages and to see whether the norms behave as expected.
By default, normtable_create()
returns point estimates
of the norms. Optionally, you can also calculate confidence intervals
(CIs) for the normed scores. The default confidence level is 0.95 (95%
interval).
The CIs are estimated using an extension of the group model from classical test theory, adapted for age-dependent scores (Heister et al. 2024). To compute the upper and lower boundaries, you need a dataframe containing two vectors:
age
: the ages at which the norms are evaluatedrel
: the estimated test reliability at each ageThe resulting CIs are stored in the output tables with the suffix
_lower
and _upper
(e.g.,
norm_sample_lower
and norm_sample_upper
).
For distributions without a closed-form mean and standard deviation, such as the BCPE distribution, computing the CIs can be slower because these quantities need to be estimated via resampling.
In the example below, we use the fictitious age-dependent
reliabilities provided in ids_rel_data
. Later, in the
section Estimating test reliability dependent on age, we show
how such reliabilities can be derived from individual item scores using
a sliding window approach. Note that for Test 14, we cannot estimate
these reliabilities from raw scores, as the individual items are not
available.
get(data("ids_rel_data"))
temp_file <- tempfile(fileext = ".xlsx")
norm_mod_BCPE_y14 <- normtable_create(model = mod_BCPE_y14,
data = mydata_BCPE,
age_name = "age",
score_name = "shaped_score",
datarel = ids_rel_data,
normtype = "IQ",
excel_name = temp_file,
excel = TRUE)
#> Norm table written to: C:\Users\P157735\AppData\Local\Temp\RtmpmUIrgj\file2430d633a2e.xlsx
Composite scores are calculated by combining multiple subtest scores, typically as the sum of their standardized scores (z-scores). To illustrate this process, we first fit a BCPE model to Test 7. Based on the wormplot inspection (not shown), the model fits the data well. Note that fitting BCPE models can be computationally intensive.
mydata_BCPE_y7 <- shape_data(data = ids_data,
age_name = "age",
score_name = "y7",
family = "BCPE")
#> Response values are valid and were not transformed.
mod_BCPE_y7 <- fb_select(data = mydata_BCPE_y7,
age_name = "age",
score_name = "shaped_score",
family = "BCPE",
selcrit = "BIC")
#> FB-select iteration 0: Polynomial degrees = 2,1,0,0 : BIC = 9116.086
#> FB-select iteration 1: Polynomial degrees = 2,1,0,1 : BIC = 9096.959
#> FB-select iteration 2: Polynomial degrees = 3,1,0,1 : BIC = 9084.672
#> FB-select iteration 3: Polynomial degrees = 4,1,0,1 : BIC = 9076.861
#> FB-select iteration 4: Polynomial degrees = 4,1,1,1 : BIC = 9067.153
#> FB-select iteration 5: Polynomial degrees = 4,1,2,1 : BIC = 9063.883
norm_mod_BCPE_y7 <- normtable_create(model = mod_BCPE_y7,
data = mydata_BCPE_y7,
age_name = "age",
score_name = "shaped_score")
Next, we calculate composite scores based on the norm tables of Test 7 and Test 14. The workflow is as follows:
composite_shape()
to compute the
composite scores as the sum of z-scores across the tests.normtable_create()
.composite <- list(norm_mod_BCPE_y7,
norm_mod_BCPE_y14)
composite_data <- composite_shape(normtables = composite)
modNO_comp <- fb_select(
data = composite_data,
age_name = "age",
score_name = "z_sum",
family = "NO",
selcrit = "BIC"
)
#> FB-select iteration 0: Polynomial degrees = 2,1 : BIC = 6630.911
#> FB-select iteration 1: Polynomial degrees = 2,0 : BIC = 6623.52
#> FB-select iteration 2: Polynomial degrees = 1,0 : BIC = 6618.417
#> FB-select iteration 3: Polynomial degrees = 0,0 : BIC = 6611.19
norm_modNO_comp <- normtable_create(modNO_comp,
composite_data,
age_name = "age",
score_name = "z_sum",
cont_cor = TRUE)
It is possible to calculate normed scores directly for a new sample, using their raw scores and ages.
In this example, we select the raw scores and ages of the first five individuals in our dataset. We then use the previously estimated model to obtain their normed scores and display the results.
newdata <- ids_data[1:5,c("age","y14")]
norm_mod_BCPE_newdata <- normtable_create(model = mod_BCPE_y14,
data = newdata,
age_name = "age",
score_name = "y14",
new_data = TRUE,
datarel = ids_rel_data)
norm_mod_BCPE_newdata[["norm_sample"]]
#> age Z
#> [1,] 10.97146 -0.2890158
#> [2,] 11.67420 0.1106675
#> [3,] 11.44422 1.6213568
#> [4,] 10.04932 -0.8806376
#> [5,] 14.18207 -0.6811148
Free order model selection with the fb_select
is also
available for distributions that are defined by the user, for example
truncated distributions generated with the gamlss.tr
package. Using truncated distributions can improve model fit when
continuous distributions are applied to scores that have natural bounds.
By explicitly accounting for minimum and maximum possible values,
truncation prevents unrealistic predictions outside the valid score
range and often leads to more accurate norm estimates. We illustrate the
use of a truncated NO distribution for the ids_kn_data
data, which consists of 34 binary item scores. This example uses a NO
distribution for demonstration purposes only. In practice, users should
explore several candidate distributions to select the one that best fits
their data.
To fit a truncated NO model, we first create a new distribution using
the gen.trun()
function. We specify truncation at both ends
of the distribution (type="both"
) with a minimum score of 0
and a maximum score of 34 (par=c(0,34)
). We apply this to
the NO family (family="NO"
) and name the new distribution
tr34
, resulting in a distribution called
NOtr34
.
Next, we shape the data appropriately for the truncated distribution and fit the model.
gamlss.tr::gen.trun(par=c(0,34), family="NO", name="tr34", type="both")
#> A truncated family of distributions from NO has been generated
#> and saved under the names:
#> dNOtr34 pNOtr34 qNOtr34 rNOtr34 NOtr34
#> The type of truncation is both
#> and the truncation parameter is 0 34
mydata_NOtr34 <- shape_data(data = ids_kn_data,
age_name = "age_years",
score_name = "rawscore",
family = "NOtr34")
#> Response values are valid and were not transformed.
mod_NOtr34_KN <- fb_select(data = mydata_NOtr34,
age_name = "age_years",
score_name = "shaped_score",
family = "NOtr34")
#> FB-select iteration 0: Polynomial degrees = 2,1 : BIC = 8919.175
#> FB-select iteration 1: Polynomial degrees = 3,1 : BIC = 8896.664
To calculate confidence intervals for the normed scores, we first
estimate the reliability of the test as a function of age using a
sliding window approach. This approach is implemented in the function
reliability_window()
, which computes Cronbach’s alpha
within age-specific groups (Heister et al.
2024). We again use the ids_kn_data
dataset.
To estimate the reliability of the test, information at the item-level is needed. First, we identify the columns in the dataset corresponding to the items.
To choose appropriate parameters for estimating age-dependent
reliabilities, we first explore a range of window widths and step sizes.
This helps ensure that the estimated reliabilities are stable and vary
smoothly with age. The function different_rel()
allows us
to compare the reliability estimates across different settings:
rel_int <- different_rel(data = ids_kn_data,
item_variables = item_variables_kn,
step_window = c(0.5, 1 , 2, 5, 10, 20),
age_name = "age_years",
min_agegroup = 5,
max_agegroup = 20,
step_agegroup = c(0.5,1,1.5,2))
We then visualize the reliability estimates across age for the different window widths and step sizes to guide our selection. Ideally, the step size should be small enough to capture smooth changes in reliability, and the window width should be wide enough to assume approximately equal true scores within the window.
For this example, we select a step size of 2 and a window width of 2.
rel_est <- reliability_window(data = ids_kn_data,
age_name = "age_years",
item_variables = item_variables_kn,
window_width = 2)
When we estimate the normed scores using the truncated BCPE model, we can now construct reliability based CIs for the normed scores.