galah has some alot of functions that display
object-oriented behaviour, which are used for two purposes:
request objectsquery
objectsBelow we’ll go through each in turn.
request objectsThe default method for building queries in galah is to
first use galah_call() to create a query object called a
“data_request”. When a piped object is of class
data_request, galah triggers functions to use specific
methods for this object class, e.g.
galah_call() |>
filter(genus == "Crinia", year == 2020) |>
group_by(species) |>
count() |>
collect()## # A tibble: 16 × 2
## species count
## <chr> <int>
## 1 Crinia signifera 42477
## 2 Crinia parinsignifera 8363
## 3 Crinia glauerti 3111
## 4 Crinia georgiana 1509
## 5 Crinia remota 717
## 6 Crinia sloanei 682
## 7 Crinia insignifera 530
## 8 Crinia tinnula 316
## 9 Crinia deserticola 254
## 10 Crinia pseudinsignifera 222
## 11 Crinia tasmaniensis 182
## 12 Crinia bilingua 75
## 13 Crinia subinsignifera 46
## 14 Crinia riparia 10
## 15 Crinia flindersensis 3
## 16 Crinia nimba 1
Thanks to object-oriented programming, galah “masks”
filter() and group_by() functions to use
methods defined for data_request objects instead. The full
list of masked functions is:
arrange() ({dplyr})count() ({dplyr})glimpse() ({dplyr})identify() ({graphics})select() ({dplyr})group_by() ({dplyr})slice_head() ({dplyr})st_crop() ({sf})Note that these functions are all evaluated lazily; they amend the underlying object, but do not amend the nature of the data until the call is evaluated.
query objectsA request object stores all the information needed to
generate a query, but does not build or enact that query. To achieve
this, galah has a second object-oriented workflow, consisting of the
following stages
capture() identifies the url needed to execute the
request. For complex requests that require multiple API calls to
evaluate, it returns a prequery object. For simpler
requests it returns a query.compund() identifies the full set of queries necessary
to properly evaluate the specified request, returning them as a
query_set.collapse() converts a query_set to a
query. This is the point in the pipeline where the final
url is generated.compute() is intended to send the query in question to
the requested API for processing. This is particularly important for
occurrences, where it can be useful to submit a query and retrieve it at
a later time. If the compute() stage is not required,
however, compute() simply converts the query
to a new class (computed_query).collect() retrieves the requested data into your
workspace, returning a tibble.We can use these in sequence, or just leap ahead to the stage we want:
x <- request_data() |>
filter(genus == "Crinia", year == 2020) |>
group_by(species) |>
arrange(species) |>
count()
capture(x)## Object of class prequery with type data/occurrences-count-groupby
## • url: https://api.ala.org.au/occurrences/occurrences/facets?fq=%28genus%3A%2...
## Object of class query_set containing 3 queries:
## • metadata/fields data: galah:::retrieve_cache("fields")
## • metadata/assertions data: galah:::retrieve_cache("assertions")
## • data/occurrences-count-groupby url: https://api.ala.org.au/occurrences/occurr...
## Object of class query with type data/occurrences-count-groupby
## • url: https://api.ala.org.au/occurrences/occurrences/facets?fq=%28genus%3A%2...
## # A tibble: 6 × 2
## species count
## <chr> <int>
## 1 Crinia bilingua 75
## 2 Crinia deserticola 254
## 3 Crinia flindersensis 3
## 4 Crinia georgiana 1509
## 5 Crinia glauerti 3111
## 6 Crinia insignifera 530
The benefit of this workflow is that it is highly modular. This is
critical for debugging workflows that might have gone wrong for one
reason or another, but it is also useful for handling large data
requests in galah. Users can send their query using
compute(), and download data once the query has finished —
downloading with collect() later — rather than waiting for
the request to finish within R.
For the above workflow to be achivable, it is neccessary for every
API call in galah to be written as a request
object. This is because compound() must collect a range of
different requests to evaluate a single query. To this end,
galah supports metadata requests, in addition to the data
requests described above.
Or to show values for states and territories:
While request_metadata() is more modular than
show_all(), there is little benefit to using it for most
applications. However, in some cases, larger databases like GBIF return
huge data.frames of metadata when called via
show_all(). Using request_metdata() allows
users to specify a slice_head() line within their pipe to
get around this issue.