Two ggplot2 calls can produce identical visual output while being written in structurally different ways — different placement of aes(), different geoms with built-in stat computation vs pre-computed data, or coord_flip() instead of swapped aesthetics. This vignette catalogues the equivalence patterns supported by ggspec, documents which function or mode is needed for each, and flags patterns not yet covered.
The patterns fall into four tiers:
| Tier | Entry point | Key rules applied |
|---|---|---|
| Direct | equiv_plot() |
Resolved aes, geom-name subset |
| Structural | compare_plots(mode = "structural") |
geom_col → geom_bar, layer order |
| Visual | compare_plots(mode = "visual") |
coord_flip absorbed, scale_*(name=) / guides() / theme(element_blank()) → labs() |
| Conceptual | compare_plots(mode = "conceptual") |
Boxplot ~ violin ~ jitter; scale limits ~ coord zoom; etc. |
Scope note: Visual equivalence is purely output-based — it calls ggplot_build() and checks that two plots produce the same rendered data, labels, facets, and coordinate system. It does not verify data provenance: two plots backed by different datasets that happen to produce visually identical output (same bar heights, same x-axis labels) will pass visual equivalence. Use structural equivalence when you need to verify that the same data and the same code idiom were used.
Because ggplot_build() must be called, visual mode also imposes a buildability requirement: both plots must be evaluable with their data accessible in the current R session. Any canonicalisation rule that requires rendering (.norm_coord_flip, .norm_guide_labels, .norm_theme_labels, .sort_layers_by_geom, equiv_rendered) can only appear in visual mode or higher — strict and structural modes are data-free and never call ggplot_build().
library(ggspec)
library(ggplot2)
#>
#> Attaching package: 'ggplot2'
#> The following object is masked from 'package:ggspec':
#>
#> is_ggplot
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, unionggspec resolves aesthetic mappings (via spec_aes(inherit = "resolve")) before any comparison, so different coding styles that yield the same final mapping all pass equiv_plot() directly.
aes()# Global aes — canonical form
p_global <- ggplot(airquality, aes(x = Day, y = Wind)) +
geom_line() + geom_point()
# All aesthetics local to each layer
p_local <- ggplot(airquality) +
geom_line(aes(x = Day, y = Wind)) +
geom_point(aes(x = Day, y = Wind))
# Mixed: global x, local y per layer
p_mixed <- ggplot(airquality, aes(x = Day)) +
geom_line(aes(y = Wind)) +
geom_point(aes(y = Wind))equiv_plot(p_global, p_local, check = c("layers", "aes"))
#> [FAIL mode=strict] 1/2 checks passed: Aesthetic mapping issue(s): x->Day (layer 0), y->Wind (layer 0).
#> Detail:
#> # A tibble: 12 × 9
#> check source layer geom stat position aesthetic variable status
#> <chr> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 layers ref 0 <NA> <NA> <NA> <NA> <NA> <NA>
#> 2 layers ref 1 line identity identity <NA> <NA> <NA>
#> 3 layers ref 2 point identity identity <NA> <NA> <NA>
#> 4 layers obs 0 <NA> <NA> <NA> <NA> <NA> <NA>
#> 5 layers obs 1 line identity identity <NA> <NA> <NA>
#> 6 layers obs 2 point identity identity <NA> <NA> <NA>
#> 7 aes global 0 <NA> <NA> <NA> x Day missing
#> 8 aes global 0 <NA> <NA> <NA> y Wind missing
#> 9 aes global 1 line <NA> <NA> x Day match
#> 10 aes global 1 line <NA> <NA> y Wind match
#> 11 aes global 2 point <NA> <NA> x Day match
#> 12 aes global 2 point <NA> <NA> y Wind match
equiv_plot(p_global, p_mixed, check = c("layers", "aes"))
#> [FAIL mode=strict] 1/2 checks passed: Aesthetic mapping issue(s): y->Wind (layer 0).
#> Detail:
#> # A tibble: 12 × 9
#> check source layer geom stat position aesthetic variable status
#> <chr> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 layers ref 0 <NA> <NA> <NA> <NA> <NA> <NA>
#> 2 layers ref 1 line identity identity <NA> <NA> <NA>
#> 3 layers ref 2 point identity identity <NA> <NA> <NA>
#> 4 layers obs 0 <NA> <NA> <NA> <NA> <NA> <NA>
#> 5 layers obs 1 line identity identity <NA> <NA> <NA>
#> 6 layers obs 2 point identity identity <NA> <NA> <NA>
#> 7 aes global 0 <NA> <NA> <NA> y Wind missing
#> 8 aes global 0 <NA> <NA> <NA> x Day match
#> 9 aes global 1 line <NA> <NA> x Day match
#> 10 aes global 1 line <NA> <NA> y Wind match
#> 11 aes global 2 point <NA> <NA> x Day match
#> 12 aes global 2 point <NA> <NA> y Wind matchAll three pass. The source column in spec_aes() records where each mapping originates ("global", "local", or "resolved"), but the comparison only examines the resolved variable name.
Specifying data globally in ggplot() vs attaching it to individual layers is also transparent:
# Global data
p_data_global <- ggplot(airquality, aes(x = Day, y = Wind)) +
geom_line() + geom_point()
# Per-layer data
p_data_local <- ggplot() +
geom_line(aes(x = Day, y = Wind), data = airquality) +
geom_point(aes(x = Day, y = Wind), data = airquality)equiv_plot(p_data_global, p_data_local, check = c("layers", "aes"))
#> [FAIL mode=strict] 0/2 checks passed: data_id mismatch at layer 0 (ref=1, obs=NA); data_id mismatch at layer 1 (ref=NA, obs=1); data_id mismatch at layer 2 (ref=NA, obs=1); Aesthetic mapping issue(s): x->Day (layer 0), y->Wind (layer 0).
#> Detail:
#> # A tibble: 12 × 9
#> check source layer geom stat position aesthetic variable status
#> <chr> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 layers ref 0 <NA> <NA> <NA> <NA> <NA> <NA>
#> 2 layers ref 1 line identity identity <NA> <NA> <NA>
#> 3 layers ref 2 point identity identity <NA> <NA> <NA>
#> 4 layers obs 0 <NA> <NA> <NA> <NA> <NA> <NA>
#> 5 layers obs 1 line identity identity <NA> <NA> <NA>
#> 6 layers obs 2 point identity identity <NA> <NA> <NA>
#> 7 aes global 0 <NA> <NA> <NA> x Day missing
#> 8 aes global 0 <NA> <NA> <NA> y Wind missing
#> 9 aes global 1 line <NA> <NA> x Day match
#> 10 aes global 1 line <NA> <NA> y Wind match
#> 11 aes global 2 point <NA> <NA> x Day match
#> 12 aes global 2 point <NA> <NA> y Wind matchspec_layers()$data_source is "global" vs "local" in these two plots, but that column is intentionally excluded from all equiv_* comparisons. Where data lives is a stylistic choice that should not penalise a student who achieves the same visual result.
The same holds for a two-dataset plot where the two layers draw from different data frames (e.g. a base map layer and a data overlay). As long as the geom names and aesthetic mappings of each layer match, equiv_layers() and equiv_aes() pass regardless of how the data frames are distributed across ggplot() and each geom_*() call.
equiv_layers(exact = FALSE) (the default) performs a subset check on geom names, so layer order does not matter. compare_plots(mode = "structural") additionally sorts layers into a canonical order before comparison, making exact = TRUE checks order-insensitive too.
p_point_line <- ggplot(airquality, aes(Day, Wind)) + geom_point() + geom_line()
p_line_point <- ggplot(airquality, aes(Day, Wind)) + geom_line() + geom_point()
# Subset check: passes regardless of order
equiv_layers(p_point_line, p_line_point)
#> [PASS] All expected geoms present.
#> Detail:
#> # A tibble: 6 × 5
#> source layer geom stat position
#> <chr> <int> <chr> <chr> <chr>
#> 1 ref 0 <NA> <NA> <NA>
#> 2 ref 1 point identity identity
#> 3 ref 2 line identity identity
#> 4 obs 0 <NA> <NA> <NA>
#> 5 obs 1 line identity identity
#> 6 obs 2 point identity identity
# Exact check: fails without canonicalisation
equiv_layers(p_point_line, p_line_point, exact = TRUE)
#> [FAIL] Expected 2 layer(s) [point, line]; got 2 [line, point].
#> Detail:
#> # A tibble: 6 × 5
#> source layer geom stat position
#> <chr> <int> <chr> <chr> <chr>
#> 1 ref 0 <NA> <NA> <NA>
#> 2 ref 1 point identity identity
#> 3 ref 2 line identity identity
#> 4 obs 0 <NA> <NA> <NA>
#> 5 obs 1 line identity identity
#> 6 obs 2 point identity identity
# Structural mode sorts layers, so exact check passes
compare_plots(p_point_line, p_line_point,
mode = "structural", check = "layers")
#> [PASS mode=structural] 1/1 checks passed
#> Detail:
#> # A tibble: 6 × 6
#> check source layer geom stat position
#> <chr> <chr> <int> <chr> <chr> <chr>
#> 1 layers ref 0 <NA> <NA> <NA>
#> 2 layers ref 1 line identity identity
#> 3 layers ref 2 point identity identity
#> 4 layers obs 0 <NA> <NA> <NA>
#> 5 layers obs 1 line identity identity
#> 6 layers obs 2 point identity identityGlobal vs local aes() for geom_bar() is the same inheritance story as above:
p_ref <- ggplot(mpg, aes(x = class)) + geom_bar()
p_loc <- ggplot(mpg) + geom_bar(aes(x = class))
equiv_plot(p_ref, p_loc, check = c("layers", "aes"))
#> [FAIL mode=strict] 1/2 checks passed: Aesthetic mapping issue(s): x->class (layer 0).
#> Detail:
#> # A tibble: 6 × 9
#> check source layer geom stat position aesthetic variable status
#> <chr> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 layers ref 0 <NA> <NA> <NA> <NA> <NA> <NA>
#> 2 layers ref 1 bar count stack <NA> <NA> <NA>
#> 3 layers obs 0 <NA> <NA> <NA> <NA> <NA> <NA>
#> 4 layers obs 1 bar count stack <NA> <NA> <NA>
#> 5 aes global 0 <NA> <NA> <NA> x class missing
#> 6 aes global 1 bar <NA> <NA> x class matchPre-counted data with geom_bar(stat = "identity") also passes equiv_plot() directly. equiv_layers(exact = FALSE) only tests geom names (“bar” in both); equiv_aes(exact = FALSE) uses subset matching, so the extra y = n mapping in the observation does not fail the reference’s x = class requirement:
counts <- count(mpg, class)
p_identity <- ggplot(counts, aes(x = class, y = n)) +
geom_bar(stat = "identity") + labs(y = "n")
equiv_plot(p_ref, p_identity, check = c("layers", "aes"))
#> [PASS mode=strict] 2/2 checks passed
#> Detail:
#> # A tibble: 8 × 9
#> check source layer geom stat position aesthetic variable status
#> <chr> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 layers ref 0 <NA> <NA> <NA> <NA> <NA> <NA>
#> 2 layers ref 1 bar count stack <NA> <NA> <NA>
#> 3 layers obs 0 <NA> <NA> <NA> <NA> <NA> <NA>
#> 4 layers obs 1 bar identity stack <NA> <NA> <NA>
#> 5 aes global 0 <NA> <NA> <NA> y n extra
#> 6 aes global 1 bar <NA> <NA> y n extra
#> 7 aes global 0 <NA> <NA> <NA> x class match
#> 8 aes global 1 bar <NA> <NA> x class matchgeom_col() — structural mode requiredgeom_col() is internally GeomCol, whose name is "col", not "bar". equiv_layers() therefore fails when the reference uses geom_bar():
p_col <- ggplot(counts, aes(x = class, y = n)) + geom_col() + labs(y = "n")
# Direct comparison fails on the layer check:
equiv_plot(p_ref, p_col, check = "layers")
#> [FAIL mode=strict] 0/1 checks passed: Missing geom(s): bar.
#> Detail:
#> # A tibble: 4 × 6
#> check source layer geom stat position
#> <chr> <chr> <int> <chr> <chr> <chr>
#> 1 layers ref 0 <NA> <NA> <NA>
#> 2 layers ref 1 bar count stack
#> 3 layers obs 0 <NA> <NA> <NA>
#> 4 layers obs 1 col identity stack
# Structural mode applies .rule_geom_col_to_bar, normalising "col" → "bar":
compare_plots(p_ref, p_col, mode = "structural", check = "layers")
#> [PASS mode=structural] 1/1 checks passed
#> Detail:
#> # A tibble: 4 × 6
#> check source layer geom stat position
#> <chr> <chr> <int> <chr> <chr> <chr>
#> 1 layers ref 0 <NA> <NA> <NA>
#> 2 layers ref 1 bar count stack
#> 3 layers obs 0 <NA> <NA> <NA>
#> 4 layers obs 1 bar identity stackThe same applies to any geom_col() combination regardless of whether scale_* or labs() is also present.
coord_flip() — visual mode requiredA bar chart written as aes(y = class) + coord_flip() has its x/y aesthetics swapped relative to the canonical aes(x = class) form. equiv_aes() sees "1::x" in the reference but only "1::y" in the observation, so it fails:
p_flip <- ggplot(mpg, aes(y = class)) + geom_bar() + coord_flip()
# Direct aes check fails:
equiv_plot(p_ref, p_flip, check = "aes")
#> [FAIL mode=strict] 0/1 checks passed: Aesthetic mapping issue(s): x->class (layer 0), x->class (layer 1).
#> Detail:
#> # A tibble: 4 × 7
#> check layer geom aesthetic variable source status
#> <chr> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 aes 0 <NA> x class global missing
#> 2 aes 1 bar x class global missing
#> 3 aes 0 <NA> y class global extra
#> 4 aes 1 bar y class global extra
# Visual mode applies .rule_coord_flip, swapping x <-> y and replacing
# coord_flip with coord_cartesian before comparison:
compare_plots(p_ref, p_flip, mode = "visual", check = c("layers", "aes", "coord"))
#> [PASS mode=visual] 1/1 checks passedThis applies equally to geom_col() + coord_flip() and to pre-counted variants. All such plots require mode = "visual" when compared to a non-flipped reference.
name vs labs() — visual mode requiredscale_y_continuous(name = "count") stores the label inside the scale object; labs(y = "count") stores it in p$labels. They appear different to equiv_labels() unless visual mode is active:
p_scale_name <- ggplot(counts, aes(x = class, y = n)) +
geom_bar(stat = "identity") + scale_y_continuous(name = "count")
p_labs <- ggplot(counts, aes(x = class, y = n)) +
geom_bar(stat = "identity") + labs(y = "count")
# Visual mode applies .rule_scale_name_to_labels, promoting the scale name
# into the labels table:
compare_plots(p_labs, p_scale_name, mode = "visual", check = "labels")
#> [PASS mode=visual] 1/1 checks passed
#> Detail:
#> # A tibble: 1 × 5
#> check aesthetic label_ref label_obs match
#> <chr> <chr> <chr> <chr> <lgl>
#> 1 labels y count count TRUEgeom_count() variants — direct equivalentsAll coding styles for the same geom_count() plot are equivalent via standard inheritance resolution:
p_count_global <- ggplot(mpg, aes(x = drv, y = class)) + geom_count()
p_count_local <- ggplot(mpg) + geom_count(aes(x = drv, y = class))
p_count_split <- ggplot(mpg, aes(x = drv)) + geom_count(aes(y = class))
equiv_plot(p_count_global, p_count_local, check = c("layers", "aes"))
#> [FAIL mode=strict] 1/2 checks passed: Aesthetic mapping issue(s): x->drv (layer 0), y->class (layer 0).
#> Detail:
#> # A tibble: 8 × 9
#> check source layer geom stat position aesthetic variable status
#> <chr> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 layers ref 0 <NA> <NA> <NA> <NA> <NA> <NA>
#> 2 layers ref 1 point sum identity <NA> <NA> <NA>
#> 3 layers obs 0 <NA> <NA> <NA> <NA> <NA> <NA>
#> 4 layers obs 1 point sum identity <NA> <NA> <NA>
#> 5 aes global 0 <NA> <NA> <NA> x drv missing
#> 6 aes global 0 <NA> <NA> <NA> y class missing
#> 7 aes global 1 point <NA> <NA> x drv match
#> 8 aes global 1 point <NA> <NA> y class match
equiv_plot(p_count_global, p_count_split, check = c("layers", "aes"))
#> [FAIL mode=strict] 1/2 checks passed: Aesthetic mapping issue(s): y->class (layer 0).
#> Detail:
#> # A tibble: 8 × 9
#> check source layer geom stat position aesthetic variable status
#> <chr> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 layers ref 0 <NA> <NA> <NA> <NA> <NA> <NA>
#> 2 layers ref 1 point sum identity <NA> <NA> <NA>
#> 3 layers obs 0 <NA> <NA> <NA> <NA> <NA> <NA>
#> 4 layers obs 1 point sum identity <NA> <NA> <NA>
#> 5 aes global 0 <NA> <NA> <NA> y class missing
#> 6 aes global 0 <NA> <NA> <NA> x drv match
#> 7 aes global 1 point <NA> <NA> x drv match
#> 8 aes global 1 point <NA> <NA> y class matchgeom_point(aes(size = n)) + pre-counted data — separate equivalence groupgeom_count() uses GeomCount (geom name "count"). geom_point() uses GeomPoint (geom name "point"). These are different names, so equiv_layers() fails when comparing across the two approaches:
mpg_counts <- count(mpg, drv, class)
p_point_sized <- ggplot(mpg_counts, aes(x = drv, y = class, size = n)) +
geom_point()
# Fails: reference has geom "count", observation has geom "point"
equiv_plot(p_count_global, p_point_sized, check = "layers")
#> [PASS mode=strict] 1/1 checks passed
#> Detail:
#> # A tibble: 4 × 6
#> check source layer geom stat position
#> <chr> <chr> <int> <chr> <chr> <chr>
#> 1 layers ref 0 <NA> <NA> <NA>
#> 2 layers ref 1 point sum identity
#> 3 layers obs 0 <NA> <NA> <NA>
#> 4 layers obs 1 point identity identityThese two plots are visually equivalent but not spec-equivalent under the current function set. They form separate equivalence groups:
geom_count() variants: equivalent within the group via equiv_plot().geom_point(aes(size = n)) + count() variants: equivalent within the group via equiv_plot().Comparing across groups requires a custom canonicalisation rule that rewrites both the geom and the data simultaneously. See Section 5.
A bar chart of species counts can be written at least four ways:
# (a) raw data, stat = "count" (default)
penguins |> ggplot() + geom_bar(aes(x = species))
# (b) pre-counted, y column named "n", explicit y-label
penguins |> count(species) |>
ggplot() + geom_bar(aes(x = species, y = n), stat = "identity") + labs(y = "count")
# (c) pre-counted, y column named "count" (= default y-label)
penguins |> count(species, name = "count") |>
ggplot() + geom_bar(aes(x = species, y = count), stat = "identity")
# (d) pre-counted, y column named "asdf", explicit y-label
penguins |> count(species, name = "asdf") |>
ggplot() + geom_bar(aes(x = species, y = asdf), stat = "identity") + labs(y = "count")Three independent checks determine visual equivalence:
Rendered bar heights — equiv_rendered() compares the built ymax values. All four forms produce the same bar heights for the same data.
Y-axis label — equiv_labels() compares the effective y-axis label. The default label for stat = "count" is "count". For pre-counted forms, the default is the y column name ("n", "count", "asdf").
Structural difference — equiv_layers() detects the stat and y aesthetic difference; equiv_aes() flags the missing/extra y mapping.
Equivalence matrix (mode = "visual"):
(a) geom_bar |
(b) geom_col + labs(y="count") |
(c) y col = "count" |
(d) geom_col + labs(y="count") |
|
|---|---|---|---|---|
| (a) | — | yes (heights + label both “count”) | yes (col name “count” = stat default) | yes (explicit label matches) |
| (b) | yes | — | yes | yes |
| (c) | yes | yes | — | yes |
| (d) | yes | yes | yes | — |
The only cases that fail are those where the effective y-axis labels differ (e.g. (a) vs a pre-counted form with y = n and no explicit labs(y = ...)). When a comparison fails only on labels but passes on rendered heights, the result $hint will say: “Rendered output matches but labels differ. Add labs() to align axis/legend titles.”
All histogram calls share geom name "bar" (histogram uses GeomBar + StatBin), so equiv_layers() passes regardless of binning parameters. equiv_aes() compares the x aesthetic, which is the same:
p_hist_default <- ggplot(mpg, aes(x = hwy)) + geom_histogram()
p_hist_bins30 <- ggplot(mpg, aes(x = hwy)) + geom_histogram(bins = 30)
p_hist_binwidth <- ggplot(mpg, aes(x = hwy)) + geom_histogram(binwidth = 3)
equiv_plot(p_hist_default, p_hist_bins30, check = c("layers", "aes"))
#> [PASS mode=strict] 2/2 checks passed
#> Detail:
#> # A tibble: 6 × 9
#> check source layer geom stat position aesthetic variable status
#> <chr> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 layers ref 0 <NA> <NA> <NA> <NA> <NA> <NA>
#> 2 layers ref 1 bar bin stack <NA> <NA> <NA>
#> 3 layers obs 0 <NA> <NA> <NA> <NA> <NA> <NA>
#> 4 layers obs 1 bar bin stack <NA> <NA> <NA>
#> 5 aes global 0 <NA> <NA> <NA> x hwy match
#> 6 aes global 1 bar <NA> <NA> x hwy match
equiv_plot(p_hist_default, p_hist_binwidth, check = c("layers", "aes"))
#> [PASS mode=strict] 2/2 checks passed
#> Detail:
#> # A tibble: 6 × 9
#> check source layer geom stat position aesthetic variable status
#> <chr> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 layers ref 0 <NA> <NA> <NA> <NA> <NA> <NA>
#> 2 layers ref 1 bar bin stack <NA> <NA> <NA>
#> 3 layers obs 0 <NA> <NA> <NA> <NA> <NA> <NA>
#> 4 layers obs 1 bar bin stack <NA> <NA> <NA>
#> 5 aes global 0 <NA> <NA> <NA> x hwy match
#> 6 aes global 1 bar <NA> <NA> x hwy matchBinning parameters sit in the params list-column of spec_layers(). They can be checked explicitly with equiv_params() when the grader wants to enforce a specific binning choice:
# Reference has no explicit bins; observation has bins = 30 — params differ
equiv_params(p_hist_default, p_hist_bins30, layer = 1L, params = "bins")
#> [FAIL] Layer 1 parameter mismatch: bins.The pedagogical mode logs bins vs binwidth usage via .rule_histogram_bin_param, without converting between them (the numeric values are not inter-derivable without knowing the data range).
The geom_count() / geom_point(aes(size = n)) + count() boundary is an instance of a broader class: a plot that uses a stat-computing geom on raw data vs a plot that pre-computes the same statistic and applies a simpler geom. Current canonicalisation rules operate only on the spec (geom name, aes mappings, coord, scale names). A rule that could bridge these groups would need to:
stat = "sum" for geom_count, stat = "bin" for geom_histogram, etc.).This requires integrating equiv_data() into the canonicalisation pipeline, which is more involved than the existing rules. See next.md.
spec_layers()$data_source marks which layers carry their own data frame, and layer_data_index() locates a specific data frame within a plot. For the comparison to verify that corresponding layers in two plots draw from semantically equivalent data frames, a matching step is needed. The current equiv_data() function compares by hash (after sorting columns), but it operates on a single layer index, not on matched pairs across plots. See next.md. ```