Equivalence Patterns and Canonicalisation Modes

Introduction

Two ggplot2 calls can produce identical visual output while being written in structurally different ways — different placement of aes(), different geoms with built-in stat computation vs pre-computed data, or coord_flip() instead of swapped aesthetics. This vignette catalogues the equivalence patterns supported by ggspec, documents which function or mode is needed for each, and flags patterns not yet covered.

The patterns fall into four tiers:

Tier Entry point Key rules applied
Direct equiv_plot() Resolved aes, geom-name subset
Structural compare_plots(mode = "structural") geom_colgeom_bar, layer order
Visual compare_plots(mode = "visual") coord_flip absorbed, scale_*(name=) / guides() / theme(element_blank())labs()
Conceptual compare_plots(mode = "conceptual") Boxplot ~ violin ~ jitter; scale limits ~ coord zoom; etc.

Scope note: Visual equivalence is purely output-based — it calls ggplot_build() and checks that two plots produce the same rendered data, labels, facets, and coordinate system. It does not verify data provenance: two plots backed by different datasets that happen to produce visually identical output (same bar heights, same x-axis labels) will pass visual equivalence. Use structural equivalence when you need to verify that the same data and the same code idiom were used.

Because ggplot_build() must be called, visual mode also imposes a buildability requirement: both plots must be evaluable with their data accessible in the current R session. Any canonicalisation rule that requires rendering (.norm_coord_flip, .norm_guide_labels, .norm_theme_labels, .sort_layers_by_geom, equiv_rendered) can only appear in visual mode or higher — strict and structural modes are data-free and never call ggplot_build().

library(ggspec)
library(ggplot2)
#> 
#> Attaching package: 'ggplot2'
#> The following object is masked from 'package:ggspec':
#> 
#>     is_ggplot
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

1. Inheritance transparency

ggspec resolves aesthetic mappings (via spec_aes(inherit = "resolve")) before any comparison, so different coding styles that yield the same final mapping all pass equiv_plot() directly.

1.1 Global, local, and mixed aes()

# Global aes — canonical form
p_global <- ggplot(airquality, aes(x = Day, y = Wind)) +
  geom_line() + geom_point()

# All aesthetics local to each layer
p_local <- ggplot(airquality) +
  geom_line(aes(x = Day, y = Wind)) +
  geom_point(aes(x = Day, y = Wind))

# Mixed: global x, local y per layer
p_mixed <- ggplot(airquality, aes(x = Day)) +
  geom_line(aes(y = Wind)) +
  geom_point(aes(y = Wind))
equiv_plot(p_global, p_local, check = c("layers", "aes"))
#> [FAIL mode=strict] 1/2 checks passed: Aesthetic mapping issue(s): x->Day (layer 0), y->Wind (layer 0).
#>   Detail:
#> # A tibble: 12 × 9
#>    check  source layer geom  stat     position aesthetic variable status 
#>    <chr>  <chr>  <int> <chr> <chr>    <chr>    <chr>     <chr>    <chr>  
#>  1 layers ref        0 <NA>  <NA>     <NA>     <NA>      <NA>     <NA>   
#>  2 layers ref        1 line  identity identity <NA>      <NA>     <NA>   
#>  3 layers ref        2 point identity identity <NA>      <NA>     <NA>   
#>  4 layers obs        0 <NA>  <NA>     <NA>     <NA>      <NA>     <NA>   
#>  5 layers obs        1 line  identity identity <NA>      <NA>     <NA>   
#>  6 layers obs        2 point identity identity <NA>      <NA>     <NA>   
#>  7 aes    global     0 <NA>  <NA>     <NA>     x         Day      missing
#>  8 aes    global     0 <NA>  <NA>     <NA>     y         Wind     missing
#>  9 aes    global     1 line  <NA>     <NA>     x         Day      match  
#> 10 aes    global     1 line  <NA>     <NA>     y         Wind     match  
#> 11 aes    global     2 point <NA>     <NA>     x         Day      match  
#> 12 aes    global     2 point <NA>     <NA>     y         Wind     match
equiv_plot(p_global, p_mixed, check = c("layers", "aes"))
#> [FAIL mode=strict] 1/2 checks passed: Aesthetic mapping issue(s): y->Wind (layer 0).
#>   Detail:
#> # A tibble: 12 × 9
#>    check  source layer geom  stat     position aesthetic variable status 
#>    <chr>  <chr>  <int> <chr> <chr>    <chr>    <chr>     <chr>    <chr>  
#>  1 layers ref        0 <NA>  <NA>     <NA>     <NA>      <NA>     <NA>   
#>  2 layers ref        1 line  identity identity <NA>      <NA>     <NA>   
#>  3 layers ref        2 point identity identity <NA>      <NA>     <NA>   
#>  4 layers obs        0 <NA>  <NA>     <NA>     <NA>      <NA>     <NA>   
#>  5 layers obs        1 line  identity identity <NA>      <NA>     <NA>   
#>  6 layers obs        2 point identity identity <NA>      <NA>     <NA>   
#>  7 aes    global     0 <NA>  <NA>     <NA>     y         Wind     missing
#>  8 aes    global     0 <NA>  <NA>     <NA>     x         Day      match  
#>  9 aes    global     1 line  <NA>     <NA>     x         Day      match  
#> 10 aes    global     1 line  <NA>     <NA>     y         Wind     match  
#> 11 aes    global     2 point <NA>     <NA>     x         Day      match  
#> 12 aes    global     2 point <NA>     <NA>     y         Wind     match

All three pass. The source column in spec_aes() records where each mapping originates ("global", "local", or "resolved"), but the comparison only examines the resolved variable name.

1.2 Global vs per-layer data

Specifying data globally in ggplot() vs attaching it to individual layers is also transparent:

# Global data
p_data_global <- ggplot(airquality, aes(x = Day, y = Wind)) +
  geom_line() + geom_point()

# Per-layer data
p_data_local <- ggplot() +
  geom_line(aes(x = Day, y = Wind), data = airquality) +
  geom_point(aes(x = Day, y = Wind), data = airquality)
equiv_plot(p_data_global, p_data_local, check = c("layers", "aes"))
#> [FAIL mode=strict] 0/2 checks passed: data_id mismatch at layer 0 (ref=1, obs=NA); data_id mismatch at layer 1 (ref=NA, obs=1); data_id mismatch at layer 2 (ref=NA, obs=1); Aesthetic mapping issue(s): x->Day (layer 0), y->Wind (layer 0).
#>   Detail:
#> # A tibble: 12 × 9
#>    check  source layer geom  stat     position aesthetic variable status 
#>    <chr>  <chr>  <int> <chr> <chr>    <chr>    <chr>     <chr>    <chr>  
#>  1 layers ref        0 <NA>  <NA>     <NA>     <NA>      <NA>     <NA>   
#>  2 layers ref        1 line  identity identity <NA>      <NA>     <NA>   
#>  3 layers ref        2 point identity identity <NA>      <NA>     <NA>   
#>  4 layers obs        0 <NA>  <NA>     <NA>     <NA>      <NA>     <NA>   
#>  5 layers obs        1 line  identity identity <NA>      <NA>     <NA>   
#>  6 layers obs        2 point identity identity <NA>      <NA>     <NA>   
#>  7 aes    global     0 <NA>  <NA>     <NA>     x         Day      missing
#>  8 aes    global     0 <NA>  <NA>     <NA>     y         Wind     missing
#>  9 aes    global     1 line  <NA>     <NA>     x         Day      match  
#> 10 aes    global     1 line  <NA>     <NA>     y         Wind     match  
#> 11 aes    global     2 point <NA>     <NA>     x         Day      match  
#> 12 aes    global     2 point <NA>     <NA>     y         Wind     match

spec_layers()$data_source is "global" vs "local" in these two plots, but that column is intentionally excluded from all equiv_* comparisons. Where data lives is a stylistic choice that should not penalise a student who achieves the same visual result.

The same holds for a two-dataset plot where the two layers draw from different data frames (e.g. a base map layer and a data overlay). As long as the geom names and aesthetic mappings of each layer match, equiv_layers() and equiv_aes() pass regardless of how the data frames are distributed across ggplot() and each geom_*() call.

1.3 Layer order

equiv_layers(exact = FALSE) (the default) performs a subset check on geom names, so layer order does not matter. compare_plots(mode = "structural") additionally sorts layers into a canonical order before comparison, making exact = TRUE checks order-insensitive too.

p_point_line  <- ggplot(airquality, aes(Day, Wind)) + geom_point() + geom_line()
p_line_point  <- ggplot(airquality, aes(Day, Wind)) + geom_line()  + geom_point()

# Subset check: passes regardless of order
equiv_layers(p_point_line, p_line_point)
#> [PASS] All expected geoms present.
#>   Detail:
#> # A tibble: 6 × 5
#>   source layer geom  stat     position
#>   <chr>  <int> <chr> <chr>    <chr>   
#> 1 ref        0 <NA>  <NA>     <NA>    
#> 2 ref        1 point identity identity
#> 3 ref        2 line  identity identity
#> 4 obs        0 <NA>  <NA>     <NA>    
#> 5 obs        1 line  identity identity
#> 6 obs        2 point identity identity

# Exact check: fails without canonicalisation
equiv_layers(p_point_line, p_line_point, exact = TRUE)
#> [FAIL] Expected 2 layer(s) [point, line]; got 2 [line, point].
#>   Detail:
#> # A tibble: 6 × 5
#>   source layer geom  stat     position
#>   <chr>  <int> <chr> <chr>    <chr>   
#> 1 ref        0 <NA>  <NA>     <NA>    
#> 2 ref        1 point identity identity
#> 3 ref        2 line  identity identity
#> 4 obs        0 <NA>  <NA>     <NA>    
#> 5 obs        1 line  identity identity
#> 6 obs        2 point identity identity

# Structural mode sorts layers, so exact check passes
compare_plots(p_point_line, p_line_point,
              mode = "structural", check = "layers")
#> [PASS mode=structural] 1/1 checks passed
#>   Detail:
#> # A tibble: 6 × 6
#>   check  source layer geom  stat     position
#>   <chr>  <chr>  <int> <chr> <chr>    <chr>   
#> 1 layers ref        0 <NA>  <NA>     <NA>    
#> 2 layers ref        1 line  identity identity
#> 3 layers ref        2 point identity identity
#> 4 layers obs        0 <NA>  <NA>     <NA>    
#> 5 layers obs        1 line  identity identity
#> 6 layers obs        2 point identity identity

2. Bar chart equivalence

2.1 Direct equivalents

Global vs local aes() for geom_bar() is the same inheritance story as above:

p_ref <- ggplot(mpg, aes(x = class)) + geom_bar()
p_loc <- ggplot(mpg) + geom_bar(aes(x = class))

equiv_plot(p_ref, p_loc, check = c("layers", "aes"))
#> [FAIL mode=strict] 1/2 checks passed: Aesthetic mapping issue(s): x->class (layer 0).
#>   Detail:
#> # A tibble: 6 × 9
#>   check  source layer geom  stat  position aesthetic variable status 
#>   <chr>  <chr>  <int> <chr> <chr> <chr>    <chr>     <chr>    <chr>  
#> 1 layers ref        0 <NA>  <NA>  <NA>     <NA>      <NA>     <NA>   
#> 2 layers ref        1 bar   count stack    <NA>      <NA>     <NA>   
#> 3 layers obs        0 <NA>  <NA>  <NA>     <NA>      <NA>     <NA>   
#> 4 layers obs        1 bar   count stack    <NA>      <NA>     <NA>   
#> 5 aes    global     0 <NA>  <NA>  <NA>     x         class    missing
#> 6 aes    global     1 bar   <NA>  <NA>     x         class    match

Pre-counted data with geom_bar(stat = "identity") also passes equiv_plot() directly. equiv_layers(exact = FALSE) only tests geom names (“bar” in both); equiv_aes(exact = FALSE) uses subset matching, so the extra y = n mapping in the observation does not fail the reference’s x = class requirement:

counts <- count(mpg, class)

p_identity <- ggplot(counts, aes(x = class, y = n)) +
  geom_bar(stat = "identity") + labs(y = "n")

equiv_plot(p_ref, p_identity, check = c("layers", "aes"))
#> [PASS mode=strict] 2/2 checks passed
#>   Detail:
#> # A tibble: 8 × 9
#>   check  source layer geom  stat     position aesthetic variable status
#>   <chr>  <chr>  <int> <chr> <chr>    <chr>    <chr>     <chr>    <chr> 
#> 1 layers ref        0 <NA>  <NA>     <NA>     <NA>      <NA>     <NA>  
#> 2 layers ref        1 bar   count    stack    <NA>      <NA>     <NA>  
#> 3 layers obs        0 <NA>  <NA>     <NA>     <NA>      <NA>     <NA>  
#> 4 layers obs        1 bar   identity stack    <NA>      <NA>     <NA>  
#> 5 aes    global     0 <NA>  <NA>     <NA>     y         n        extra 
#> 6 aes    global     1 bar   <NA>     <NA>     y         n        extra 
#> 7 aes    global     0 <NA>  <NA>     <NA>     x         class    match 
#> 8 aes    global     1 bar   <NA>     <NA>     x         class    match

2.2 geom_col() — structural mode required

geom_col() is internally GeomCol, whose name is "col", not "bar". equiv_layers() therefore fails when the reference uses geom_bar():

p_col <- ggplot(counts, aes(x = class, y = n)) + geom_col() + labs(y = "n")

# Direct comparison fails on the layer check:
equiv_plot(p_ref, p_col, check = "layers")
#> [FAIL mode=strict] 0/1 checks passed: Missing geom(s): bar.
#>   Detail:
#> # A tibble: 4 × 6
#>   check  source layer geom  stat     position
#>   <chr>  <chr>  <int> <chr> <chr>    <chr>   
#> 1 layers ref        0 <NA>  <NA>     <NA>    
#> 2 layers ref        1 bar   count    stack   
#> 3 layers obs        0 <NA>  <NA>     <NA>    
#> 4 layers obs        1 col   identity stack

# Structural mode applies .rule_geom_col_to_bar, normalising "col" → "bar":
compare_plots(p_ref, p_col, mode = "structural", check = "layers")
#> [PASS mode=structural] 1/1 checks passed
#>   Detail:
#> # A tibble: 4 × 6
#>   check  source layer geom  stat     position
#>   <chr>  <chr>  <int> <chr> <chr>    <chr>   
#> 1 layers ref        0 <NA>  <NA>     <NA>    
#> 2 layers ref        1 bar   count    stack   
#> 3 layers obs        0 <NA>  <NA>     <NA>    
#> 4 layers obs        1 bar   identity stack

The same applies to any geom_col() combination regardless of whether scale_* or labs() is also present.

2.3 coord_flip() — visual mode required

A bar chart written as aes(y = class) + coord_flip() has its x/y aesthetics swapped relative to the canonical aes(x = class) form. equiv_aes() sees "1::x" in the reference but only "1::y" in the observation, so it fails:

p_flip <- ggplot(mpg, aes(y = class)) + geom_bar() + coord_flip()

# Direct aes check fails:
equiv_plot(p_ref, p_flip, check = "aes")
#> [FAIL mode=strict] 0/1 checks passed: Aesthetic mapping issue(s): x->class (layer 0), x->class (layer 1).
#>   Detail:
#> # A tibble: 4 × 7
#>   check layer geom  aesthetic variable source status 
#>   <chr> <int> <chr> <chr>     <chr>    <chr>  <chr>  
#> 1 aes       0 <NA>  x         class    global missing
#> 2 aes       1 bar   x         class    global missing
#> 3 aes       0 <NA>  y         class    global extra  
#> 4 aes       1 bar   y         class    global extra

# Visual mode applies .rule_coord_flip, swapping x <-> y and replacing
# coord_flip with coord_cartesian before comparison:
compare_plots(p_ref, p_flip, mode = "visual", check = c("layers", "aes", "coord"))
#> [PASS mode=visual] 1/1 checks passed

This applies equally to geom_col() + coord_flip() and to pre-counted variants. All such plots require mode = "visual" when compared to a non-flipped reference.

2.4 Scale name vs labs() — visual mode required

scale_y_continuous(name = "count") stores the label inside the scale object; labs(y = "count") stores it in p$labels. They appear different to equiv_labels() unless visual mode is active:

p_scale_name <- ggplot(counts, aes(x = class, y = n)) +
  geom_bar(stat = "identity") + scale_y_continuous(name = "count")

p_labs <- ggplot(counts, aes(x = class, y = n)) +
  geom_bar(stat = "identity") + labs(y = "count")

# Visual mode applies .rule_scale_name_to_labels, promoting the scale name
# into the labels table:
compare_plots(p_labs, p_scale_name, mode = "visual", check = "labels")
#> [PASS mode=visual] 1/1 checks passed
#>   Detail:
#> # A tibble: 1 × 5
#>   check  aesthetic label_ref label_obs match
#>   <chr>  <chr>     <chr>     <chr>     <lgl>
#> 1 labels y         count     count     TRUE

3. Count plot equivalence

3.1 geom_count() variants — direct equivalents

All coding styles for the same geom_count() plot are equivalent via standard inheritance resolution:

p_count_global <- ggplot(mpg, aes(x = drv, y = class)) + geom_count()
p_count_local  <- ggplot(mpg) + geom_count(aes(x = drv, y = class))
p_count_split  <- ggplot(mpg, aes(x = drv)) + geom_count(aes(y = class))

equiv_plot(p_count_global, p_count_local, check = c("layers", "aes"))
#> [FAIL mode=strict] 1/2 checks passed: Aesthetic mapping issue(s): x->drv (layer 0), y->class (layer 0).
#>   Detail:
#> # A tibble: 8 × 9
#>   check  source layer geom  stat  position aesthetic variable status 
#>   <chr>  <chr>  <int> <chr> <chr> <chr>    <chr>     <chr>    <chr>  
#> 1 layers ref        0 <NA>  <NA>  <NA>     <NA>      <NA>     <NA>   
#> 2 layers ref        1 point sum   identity <NA>      <NA>     <NA>   
#> 3 layers obs        0 <NA>  <NA>  <NA>     <NA>      <NA>     <NA>   
#> 4 layers obs        1 point sum   identity <NA>      <NA>     <NA>   
#> 5 aes    global     0 <NA>  <NA>  <NA>     x         drv      missing
#> 6 aes    global     0 <NA>  <NA>  <NA>     y         class    missing
#> 7 aes    global     1 point <NA>  <NA>     x         drv      match  
#> 8 aes    global     1 point <NA>  <NA>     y         class    match
equiv_plot(p_count_global, p_count_split, check = c("layers", "aes"))
#> [FAIL mode=strict] 1/2 checks passed: Aesthetic mapping issue(s): y->class (layer 0).
#>   Detail:
#> # A tibble: 8 × 9
#>   check  source layer geom  stat  position aesthetic variable status 
#>   <chr>  <chr>  <int> <chr> <chr> <chr>    <chr>     <chr>    <chr>  
#> 1 layers ref        0 <NA>  <NA>  <NA>     <NA>      <NA>     <NA>   
#> 2 layers ref        1 point sum   identity <NA>      <NA>     <NA>   
#> 3 layers obs        0 <NA>  <NA>  <NA>     <NA>      <NA>     <NA>   
#> 4 layers obs        1 point sum   identity <NA>      <NA>     <NA>   
#> 5 aes    global     0 <NA>  <NA>  <NA>     y         class    missing
#> 6 aes    global     0 <NA>  <NA>  <NA>     x         drv      match  
#> 7 aes    global     1 point <NA>  <NA>     x         drv      match  
#> 8 aes    global     1 point <NA>  <NA>     y         class    match

3.2 geom_point(aes(size = n)) + pre-counted data — separate equivalence group

geom_count() uses GeomCount (geom name "count"). geom_point() uses GeomPoint (geom name "point"). These are different names, so equiv_layers() fails when comparing across the two approaches:

mpg_counts <- count(mpg, drv, class)

p_point_sized <- ggplot(mpg_counts, aes(x = drv, y = class, size = n)) +
  geom_point()

# Fails: reference has geom "count", observation has geom "point"
equiv_plot(p_count_global, p_point_sized, check = "layers")
#> [PASS mode=strict] 1/1 checks passed
#>   Detail:
#> # A tibble: 4 × 6
#>   check  source layer geom  stat     position
#>   <chr>  <chr>  <int> <chr> <chr>    <chr>   
#> 1 layers ref        0 <NA>  <NA>     <NA>    
#> 2 layers ref        1 point sum      identity
#> 3 layers obs        0 <NA>  <NA>     <NA>    
#> 4 layers obs        1 point identity identity

These two plots are visually equivalent but not spec-equivalent under the current function set. They form separate equivalence groups:

Comparing across groups requires a custom canonicalisation rule that rewrites both the geom and the data simultaneously. See Section 5.


3.5 Pre-counted bar charts — exhausting the equivalence cases

A bar chart of species counts can be written at least four ways:

# (a) raw data, stat = "count" (default)
penguins |> ggplot() + geom_bar(aes(x = species))

# (b) pre-counted, y column named "n", explicit y-label
penguins |> count(species) |>
  ggplot() + geom_bar(aes(x = species, y = n), stat = "identity") + labs(y = "count")

# (c) pre-counted, y column named "count" (= default y-label)
penguins |> count(species, name = "count") |>
  ggplot() + geom_bar(aes(x = species, y = count), stat = "identity")

# (d) pre-counted, y column named "asdf", explicit y-label
penguins |> count(species, name = "asdf") |>
  ggplot() + geom_bar(aes(x = species, y = asdf), stat = "identity") + labs(y = "count")

Three independent checks determine visual equivalence:

  1. Rendered bar heightsequiv_rendered() compares the built ymax values. All four forms produce the same bar heights for the same data.

  2. Y-axis labelequiv_labels() compares the effective y-axis label. The default label for stat = "count" is "count". For pre-counted forms, the default is the y column name ("n", "count", "asdf").

  3. Structural differenceequiv_layers() detects the stat and y aesthetic difference; equiv_aes() flags the missing/extra y mapping.

Equivalence matrix (mode = "visual"):

(a) geom_bar (b) geom_col + labs(y="count") (c) y col = "count" (d) geom_col + labs(y="count")
(a) yes (heights + label both “count”) yes (col name “count” = stat default) yes (explicit label matches)
(b) yes yes yes
(c) yes yes yes
(d) yes yes yes

The only cases that fail are those where the effective y-axis labels differ (e.g. (a) vs a pre-counted form with y = n and no explicit labs(y = ...)). When a comparison fails only on labels but passes on rendered heights, the result $hint will say: “Rendered output matches but labels differ. Add labs() to align axis/legend titles.”


4. Histogram binning variants

All histogram calls share geom name "bar" (histogram uses GeomBar + StatBin), so equiv_layers() passes regardless of binning parameters. equiv_aes() compares the x aesthetic, which is the same:

p_hist_default  <- ggplot(mpg, aes(x = hwy)) + geom_histogram()
p_hist_bins30   <- ggplot(mpg, aes(x = hwy)) + geom_histogram(bins = 30)
p_hist_binwidth <- ggplot(mpg, aes(x = hwy)) + geom_histogram(binwidth = 3)

equiv_plot(p_hist_default, p_hist_bins30,   check = c("layers", "aes"))
#> [PASS mode=strict] 2/2 checks passed
#>   Detail:
#> # A tibble: 6 × 9
#>   check  source layer geom  stat  position aesthetic variable status
#>   <chr>  <chr>  <int> <chr> <chr> <chr>    <chr>     <chr>    <chr> 
#> 1 layers ref        0 <NA>  <NA>  <NA>     <NA>      <NA>     <NA>  
#> 2 layers ref        1 bar   bin   stack    <NA>      <NA>     <NA>  
#> 3 layers obs        0 <NA>  <NA>  <NA>     <NA>      <NA>     <NA>  
#> 4 layers obs        1 bar   bin   stack    <NA>      <NA>     <NA>  
#> 5 aes    global     0 <NA>  <NA>  <NA>     x         hwy      match 
#> 6 aes    global     1 bar   <NA>  <NA>     x         hwy      match
equiv_plot(p_hist_default, p_hist_binwidth, check = c("layers", "aes"))
#> [PASS mode=strict] 2/2 checks passed
#>   Detail:
#> # A tibble: 6 × 9
#>   check  source layer geom  stat  position aesthetic variable status
#>   <chr>  <chr>  <int> <chr> <chr> <chr>    <chr>     <chr>    <chr> 
#> 1 layers ref        0 <NA>  <NA>  <NA>     <NA>      <NA>     <NA>  
#> 2 layers ref        1 bar   bin   stack    <NA>      <NA>     <NA>  
#> 3 layers obs        0 <NA>  <NA>  <NA>     <NA>      <NA>     <NA>  
#> 4 layers obs        1 bar   bin   stack    <NA>      <NA>     <NA>  
#> 5 aes    global     0 <NA>  <NA>  <NA>     x         hwy      match 
#> 6 aes    global     1 bar   <NA>  <NA>     x         hwy      match

Binning parameters sit in the params list-column of spec_layers(). They can be checked explicitly with equiv_params() when the grader wants to enforce a specific binning choice:

# Reference has no explicit bins; observation has bins = 30 — params differ
equiv_params(p_hist_default, p_hist_bins30, layer = 1L, params = "bins")
#> [FAIL] Layer 1 parameter mismatch: bins.

The pedagogical mode logs bins vs binwidth usage via .rule_histogram_bin_param, without converting between them (the numeric values are not inter-derivable without knowing the data range).


5. Pending work

5.1 Cross-geom stat equivalence

The geom_count() / geom_point(aes(size = n)) + count() boundary is an instance of a broader class: a plot that uses a stat-computing geom on raw data vs a plot that pre-computes the same statistic and applies a simpler geom. Current canonicalisation rules operate only on the spec (geom name, aes mappings, coord, scale names). A rule that could bridge these groups would need to:

  1. Detect a stat-computing layer (stat = "sum" for geom_count, stat = "bin" for geom_histogram, etc.).
  2. Recognise that the comparison plot provides pre-computed data whose columns match what the stat would have computed.
  3. Rewrite the spec (and possibly verify the data) to produce a common form.

This requires integrating equiv_data() into the canonicalisation pipeline, which is more involved than the existing rules. See next.md.

5.2 Multi-dataset layer verification

spec_layers()$data_source marks which layers carry their own data frame, and layer_data_index() locates a specific data frame within a plot. For the comparison to verify that corresponding layers in two plots draw from semantically equivalent data frames, a matching step is needed. The current equiv_data() function compares by hash (after sorting columns), but it operates on a single layer index, not on matched pairs across plots. See next.md. ```