Plot Grading and Testing with ggspec

Introduction

ggspec provides a comparison tier (equiv_*()) and a check/assertion tier (check_plot(), expect_equiv_plot()) for comparing two ggplot objects. These are designed to be framework-agnostic: they work in plain R scripts, testthat test suites, and learnr/gradethis grading pipelines.

Checking visual equivalence is particularly important in the age of AI-assisted coding: different large-language models generate syntactically different code for the same visualisation task (geom_bar() on raw data vs geom_col() on pre-counted data; labs(x = ...) vs scale_x_continuous(name = ...)). ggspec provides a four-level hierarchy of equivalence checks so that functionally identical plots are recognised as equivalent regardless of how they were written.

library(ggspec)
library(ggplot2)

Comparing two plots with equiv_plot()

equiv_plot() is the high-level entry point. It accepts two ggplot objects and a character vector of check names to run. It returns a ggspec_result object that holds a pass/fail flag, a human-readable message, and a structured diff.

ref <- ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class)) +
  facet_wrap(~drv) +
  labs(title = "Reference plot")

obs_correct <- ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = class)) +
  facet_wrap(~drv) +
  labs(title = "Reference plot")

obs_wrong <- ggplot(mpg, aes(displ, hwy)) +
  geom_smooth() +            # wrong geom
  facet_wrap(~cyl) +         # wrong facet variable
  labs(title = "Student plot")
# Passing case
result_ok <- equiv_plot(ref, obs_correct)
result_ok
#> [PASS mode=strict] 6/6 checks passed
#>   Detail:
#> # A tibble: 10 × 12
#>    check  source layer geom  stat   position aesthetic variable status label_ref
#>    <chr>  <chr>  <int> <chr> <chr>  <chr>    <chr>     <chr>    <chr>  <chr>    
#>  1 layers ref        0 <NA>  <NA>   <NA>     <NA>      <NA>     <NA>   <NA>     
#>  2 layers ref        1 point ident… identity <NA>      <NA>     <NA>   <NA>     
#>  3 layers obs        0 <NA>  <NA>   <NA>     <NA>      <NA>     <NA>   <NA>     
#>  4 layers obs        1 point ident… identity <NA>      <NA>     <NA>   <NA>     
#>  5 aes    global     0 <NA>  <NA>   <NA>     x         displ    match  <NA>     
#>  6 aes    global     0 <NA>  <NA>   <NA>     y         hwy      match  <NA>     
#>  7 aes    global     1 point <NA>   <NA>     x         displ    match  <NA>     
#>  8 aes    global     1 point <NA>   <NA>     y         hwy      match  <NA>     
#>  9 aes    local      1 point <NA>   <NA>     colour    class    match  <NA>     
#> 10 labels <NA>      NA <NA>  <NA>   <NA>     title     <NA>     <NA>   Referenc…
#> # ℹ 2 more variables: label_obs <chr>, match <lgl>
as.logical(result_ok)
#> [1] TRUE
# Failing case
result_fail <- equiv_plot(ref, obs_wrong)
result_fail
#> [FAIL mode=strict] 2/6 checks passed: Missing geom(s): point.; Aesthetic mapping issue(s): colour->class (layer 1).; Facet mismatch: cols: 'drv' vs 'cyl'; wrong label(s): 'title' (expected 'Reference plot', got 'Student plot')
#>   Detail:
#> # A tibble: 10 × 12
#>    check  source layer geom   stat  position aesthetic variable status label_ref
#>    <chr>  <chr>  <int> <chr>  <chr> <chr>    <chr>     <chr>    <chr>  <chr>    
#>  1 layers ref        0 <NA>   <NA>  <NA>     <NA>      <NA>     <NA>   <NA>     
#>  2 layers ref        1 point  iden… identity <NA>      <NA>     <NA>   <NA>     
#>  3 layers obs        0 <NA>   <NA>  <NA>     <NA>      <NA>     <NA>   <NA>     
#>  4 layers obs        1 smooth smoo… identity <NA>      <NA>     <NA>   <NA>     
#>  5 aes    local      1 point  <NA>  <NA>     colour    class    missi… <NA>     
#>  6 aes    global     0 <NA>   <NA>  <NA>     x         displ    match  <NA>     
#>  7 aes    global     0 <NA>   <NA>  <NA>     y         hwy      match  <NA>     
#>  8 aes    global     1 point  <NA>  <NA>     x         displ    match  <NA>     
#>  9 aes    global     1 point  <NA>  <NA>     y         hwy      match  <NA>     
#> 10 labels <NA>      NA <NA>   <NA>  <NA>     title     <NA>     <NA>   Referenc…
#> # ℹ 2 more variables: label_obs <chr>, match <lgl>

Running individual checks

Each equiv_*() function tests one dimension:

equiv_layers(ref, obs_wrong)
#> [FAIL] Missing geom(s): point.
#>   Hint: Add + geom_point() to the observed plot.
#>   Detail:
#> # A tibble: 4 × 5
#>   source layer geom   stat     position
#>   <chr>  <int> <chr>  <chr>    <chr>   
#> 1 ref        0 <NA>   <NA>     <NA>    
#> 2 ref        1 point  identity identity
#> 3 obs        0 <NA>   <NA>     <NA>    
#> 4 obs        1 smooth smooth   identity
equiv_facets(ref, obs_wrong)
#> [FAIL] Facet mismatch: cols: 'drv' vs 'cyl'
equiv_labels(ref, obs_wrong, aesthetics = "title")
#> [FAIL] wrong label(s): 'title' (expected 'Reference plot', got 'Student plot')
#>   Hint: Add labs(title = 'Reference plot') to the observed plot.
#>   Detail:
#> # A tibble: 1 × 4
#>   aesthetic label_ref      label_obs    match
#>   <chr>     <chr>          <chr>        <lgl>
#> 1 title     Reference plot Student plot FALSE

The exact argument

By default, equiv_layers() and equiv_aes() use subset matching: the observed plot must contain at least the layers/mappings of the reference. Set exact = TRUE to require an exact match.

obs_extra <- ref + geom_smooth()  # extra layer is fine by default
equiv_layers(ref, obs_extra)
#> [PASS] All expected geoms present.
#>   Detail:
#> # A tibble: 5 × 5
#>   source layer geom   stat     position
#>   <chr>  <int> <chr>  <chr>    <chr>   
#> 1 ref        0 <NA>   <NA>     <NA>    
#> 2 ref        1 point  identity identity
#> 3 obs        0 <NA>   <NA>     <NA>    
#> 4 obs        1 point  identity identity
#> 5 obs        2 smooth smooth   identity

equiv_layers(ref, obs_extra, exact = TRUE)  # fails: extra layer
#> [FAIL] Expected 1 layer(s) [point]; got 2 [point, smooth].
#>   Detail:
#> # A tibble: 5 × 5
#>   source layer geom   stat     position
#>   <chr>  <int> <chr>  <chr>    <chr>   
#> 1 ref        0 <NA>   <NA>     <NA>    
#> 2 ref        1 point  identity identity
#> 3 obs        0 <NA>   <NA>     <NA>    
#> 4 obs        1 point  identity identity
#> 5 obs        2 smooth smooth   identity

Framework-agnostic checking with check_plot()

check_plot() wraps equiv_plot() and calls a fail_fn if the check fails. The default fail_fn = stop makes it work anywhere.

# Passes silently
check_plot(obs_correct, ref, check = c("layers", "aes", "facets"))

# Fails with an informative error
check_plot(obs_wrong, ref, check = c("layers", "facets"))
#> Error in check_plot(obs_wrong, ref, check = c("layers", "facets")): 0/2 checks passed: Missing geom(s): point.; Facet mismatch: cols: 'drv' vs 'cyl'

Swapping in a learnr/gradethis fail function

In a learnr tutorial, swap the fail_fn and pass_fn arguments to use the grading framework’s own signalling functions (e.g. gradethis::fail / gradethis::pass):

# Inside a learnr grade_this() block:
check_plot(
  .result,
  expected = ref,
  check    = c("layers", "aes", "facets"),
  fail_fn  = your_grading_framework_fail_fn,
  pass_fn  = your_grading_framework_pass_fn
)

No hard dependency on any grading framework is required — fail_fn and pass_fn can be any functions with compatible signatures.

Using expect_equiv_plot() in testthat

testthat::test_that("student plot has correct layers and facets", {
  expect_equiv_plot(
    obs_correct,
    ref,
    check = c("layers", "aes", "facets")
  )
})

Inspecting the diff

Every equiv_*() result carries a $detail data frame for programmatic inspection:

result <- equiv_aes(ref, obs_wrong)
result$detail
#> # A tibble: 5 × 6
#>   layer geom  aesthetic variable source status 
#>   <int> <chr> <chr>     <chr>    <chr>  <chr>  
#> 1     1 point colour    class    local  missing
#> 2     0 <NA>  x         displ    global match  
#> 3     0 <NA>  y         hwy      global match  
#> 4     1 point x         displ    global match  
#> 5     1 point y         hwy      global match

Comparing layer parameters

equiv_params() checks whether a specific layer’s non-aesthetic parameters match, e.g. checking that a student used se = FALSE on geom_smooth().

p_ref   <- ggplot(mpg, aes(displ, hwy)) + geom_smooth(method = "lm", se = FALSE)
p_wrong <- ggplot(mpg, aes(displ, hwy)) + geom_smooth(method = "lm", se = TRUE)

equiv_params(p_ref, p_wrong, layer = 1L, params = "se")
#> [FAIL] Layer 1 parameter mismatch: se.

Canonicalisation-aware comparison with compare_plots()

equiv_plot() performs direct structural comparison. When two plots are semantically equivalent but written differently — different geoms for the same stat, reversed aesthetic axes, scale names vs labs() — use compare_plots(), which normalises both plots before comparing.

Modes

# "structural" — normalises geom_col → geom_bar, sorts layer order
compare_plots(p_ref, p_col, mode = "structural", check = "layers")

# "visual" — additionally absorbs coord_flip() and scale name → labs()
compare_plots(p_ref, p_flip, mode = "visual", check = c("layers", "aes", "coord"))

The result is a ggspec_compare object extending ggspec_result, with extra fields $canon_p1, $canon_p2 (the canonicalised specs) and $mode.

Using a mode in check_plot()

Pass mode to check_plot() to apply canonicalisation in grading pipelines:

# Passes for a student who used geom_col() instead of geom_bar()
check_plot(student_plot, ref,
           check = "layers",
           mode  = "structural")

# In learnr (swap fail_fn/pass_fn for your grading framework):
check_plot(.result, ref,
           check   = c("layers", "aes", "coord"),
           mode    = "visual",
           fail_fn = your_grading_fail_fn,
           pass_fn = your_grading_pass_fn)

What each mode covers

Mode Normalisation rules applied
"strict" None beyond what spec_plot() already does
"structural" geom_col -> geom_bar; layer order sorted
"visual" Structural + coord_flip absorbed; scale name -> labs()
"pedagogical" Visual + histogram bins/binwidth flagged; after_stat() logged

The $changes tibble on a ggspec_canon object records every normalisation applied, making the comparison transparent:

c1 <- canon(p_flip, mode = "visual")
c1$changes   # shows the coord_flip rule and its x/y swap

For a full catalogue of which equivalence patterns require which mode, see vignette("equivalence-patterns").


Summary of available checks

Function What it checks
equiv_layers() Geom and stat per layer
equiv_aes() Aesthetic-to-variable mappings
equiv_scales() Explicitly added scales
equiv_facets() Facet type and variables
equiv_labels() Title, axis, and aesthetic labels
equiv_coord() Coordinate system type
equiv_params() Non-aesthetic layer parameters
equiv_data() Data hash per layer
equiv_plot() All of the above in one call (direct)
compare_plots() Canonicalise then equiv_plot()