The goal of vital is to allow analysis of demographic data using tidy tools. It works with other tidyverse packages such as dplyr and ggplot2. It also works with the tidyverts packages, tsibble and fable.
The basic data object is a vital
, which is a
time-indexed tibble that contains vital statistics such as births,
deaths, population counts, and mortality and fertility rates.
We will use Norwegian data in the following examples. First, let’s remove the “Total” Sex category and collapse the upper ages into a final age group of 100+.
nor <- norway_mortality |>
filter(Sex != "Total") |>
collapse_ages(max_age = 100)
nor
#> # A vital: 25,048 x 7 [1Y]
#> # Key: Age x Sex [101 x 2]
#> Year Age OpenInterval Sex Population Deaths Mortality
#> <int> <int> <lgl> <chr> <dbl> <dbl> <dbl>
#> 1 1900 0 FALSE Female 30070 2376. 0.0778
#> 2 1900 1 FALSE Female 28960 842 0.0290
#> 3 1900 2 FALSE Female 28043 348 0.0123
#> 4 1900 3 FALSE Female 27019 216. 0.00786
#> 5 1900 4 FALSE Female 26854 168. 0.00624
#> 6 1900 5 FALSE Female 25569 140. 0.00538
#> 7 1900 6 FALSE Female 25534 108. 0.00422
#> 8 1900 7 FALSE Female 24314 93.5 0.00376
#> 9 1900 8 FALSE Female 24979 93.5 0.00380
#> 10 1900 9 FALSE Female 24428 90 0.00365
#> # ℹ 25,038 more rows
This example contains data from 1900 to 2023. There are 101 age groups and 2 Sex categories. A vital must have a time “index” variable, and optionally other categorical variables known as “key” variables. Each row must have a unique combination of the index and key variables. Some columns are “vital” variables, such as “Age” and “Sex”.
We can use functions to see which variables are index, key or vital:
There are autoplot()
functions for plotting
vital
objects. These produce rainbow plots (Hyndman and Shang 2010) where each line
represents data for one year, and the variable is plotted against
age.
We can use standard ggplot functions to modify the plot as desired. For example, here are population pyramids for all years.
Life tables (Chiang 1984) can be
produced using the life_table()
function. It will produce
life tables for each unique combination of the index and key variables
other than age.
Life expectancy (\(e_x\) with \(x=0\) by default) is computed using
life_expectancy()
:
Several smoothing functions are provided:
smooth_spline()
, smooth_mortality()
,
smooth_fertility()
, and smooth_loess()
, each
smoothing across the age variable for each year. The methods used in
smooth_mortality()
and smooth_fertility()
are
described in Hyndman and Ullah (2007).
Lee-Carter models (Lee and Carter 1992)
are estimated using the LC()
function which must be called
within a model()
function:
Models are fitted for all combinations of key variables excluding
age. To see the details for a specific model, use the
report()
function.
The results can be plotted.
The components can be extracted.
Forecasts are obtained using the forecast()
function
The forecasts are returned as a distribution column (here transformed
normal because of the log transformation used in the model). The
.mean
column gives the point forecasts equal to the mean of
the distribution column.
Functional data models (Hyndman and Ullah
2007) can be estimated in a similar way to Lee-Carter models, but
with an additional smoothing step, then modelling with LC
replaced by FDM
.
# FDM model
fdm <- nor |>
smooth_mortality(Mortality) |>
model(hu = FDM(log(.smooth)))
fc_fdm <- fdm |>
forecast(h = 20)
autoplot(fc_fdm) +
scale_y_log10()
Functional data models have multiple principal components, rather than the single factor used in Lee-Carter models.
By default, six factors are estimated using FDM()
. Here
we have chosen to plot only the first three.
The components can be extracted.
A coherent functional data model (Hyndman, Booth, and Yasmeen 2013), is obtained by first computing the sex-products and sex-ratios of the smoothed mortality data. Then a functional data model is fitted to the smoothed data, forecasts are obtained, and the product/ratio transformation is reversed. The following code shows the steps.
fdm_coherent <- nor |>
smooth_mortality(Mortality) |>
make_pr(.smooth) |>
model(hby = FDM(log(.smooth), coherent = TRUE))
fc_coherent <- fdm_coherent |>
forecast(h = 20) |>
undo_pr(.smooth)
fc_coherent
Here, make_pr()
makes the product-ratios, while
undo_pr()
undoes them.
The argument coherent = TRUE
in FDM()
ensures that the ARIMA models fitted to the coefficients are stationary
when applied to the sex-ratios.