Object-Oriented Programming

Martin Westgate & Dax Kellie

2026-02-11

galah has some alot of functions that display object-oriented behaviour, which are used for two purposes:

Below we’ll go through each in turn.

request objects

The default method for building queries in galah is to first use galah_call() to create a query object called a “data_request”. When a piped object is of class data_request, galah triggers functions to use specific methods for this object class, e.g.

galah_call() |> 
  filter(genus == "Crinia", year == 2020) |>
  group_by(species) |>
  count() |>
  collect()
## # A tibble: 16 × 2
##    species                 count
##    <chr>                   <int>
##  1 Crinia signifera        42477
##  2 Crinia parinsignifera    8363
##  3 Crinia glauerti          3111
##  4 Crinia georgiana         1509
##  5 Crinia remota             717
##  6 Crinia sloanei            682
##  7 Crinia insignifera        530
##  8 Crinia tinnula            316
##  9 Crinia deserticola        254
## 10 Crinia pseudinsignifera   222
## 11 Crinia tasmaniensis       182
## 12 Crinia bilingua            75
## 13 Crinia subinsignifera      46
## 14 Crinia riparia             10
## 15 Crinia flindersensis        3
## 16 Crinia nimba                1

Thanks to object-oriented programming, galah “masks” filter() and group_by() functions to use methods defined for data_request objects instead. The full list of masked functions is:

Note that these functions are all evaluated lazily; they amend the underlying object, but do not amend the nature of the data until the call is evaluated.

query objects

A request object stores all the information needed to generate a query, but does not build or enact that query. To achieve this, galah has a second object-oriented workflow, consisting of the following stages

We can use these in sequence, or just leap ahead to the stage we want:

x <- request_data() |>
  filter(genus == "Crinia", year == 2020) |>
  group_by(species) |>
  arrange(species) |>
  count()

capture(x)
## Object of class prequery with type data/occurrences-count-groupby
## • url: https://api.ala.org.au/occurrences/occurrences/facets?fq=%28genus%3A%2...
compound(x)
## Object of class query_set containing 3 queries:
## • metadata/fields data: galah:::retrieve_cache("fields")
## • metadata/assertions data: galah:::retrieve_cache("assertions")
## • data/occurrences-count-groupby url: https://api.ala.org.au/occurrences/occurr...
collapse(x)
## Object of class query with type data/occurrences-count-groupby
## • url: https://api.ala.org.au/occurrences/occurrences/facets?fq=%28genus%3A%2...
collect(x) |> head()
## # A tibble: 6 × 2
##   species              count
##   <chr>                <int>
## 1 Crinia bilingua         75
## 2 Crinia deserticola     254
## 3 Crinia flindersensis     3
## 4 Crinia georgiana      1509
## 5 Crinia glauerti       3111
## 6 Crinia insignifera     530

The benefit of this workflow is that it is highly modular. This is critical for debugging workflows that might have gone wrong for one reason or another, but it is also useful for handling large data requests in galah. Users can send their query using compute(), and download data once the query has finished — downloading with collect() later — rather than waiting for the request to finish within R.

# Create and send query to be calculated server-side
request <- request_data() |>
  identify("perameles") |>
  filter(year > 1900) |>
  compute()
  
# Download data
request |>
  collect()

metadata requests

For the above workflow to be achivable, it is neccessary for every API call in galah to be written as a request object. This is because compound() must collect a range of different requests to evaluate a single query. To this end, galah supports metadata requests, in addition to the data requests described above.

request_metadata(type = "fields") |>
  collect()

Or to show values for states and territories:

request_metadata() |>
  filter(field == "cl22") |>
  unnest() |>
  collect()

While request_metadata() is more modular than show_all(), there is little benefit to using it for most applications. However, in some cases, larger databases like GBIF return huge data.frames of metadata when called via show_all(). Using request_metdata() allows users to specify a slice_head() line within their pipe to get around this issue.