closest {MsCoreUtils} | R Documentation |
These functions offer relaxed matching of one vector in another.
In contrast to the similar match()
and %in%
functions they
just accept numeric
arguments but have an additional tolerance
argument that allows relaxed matching.
closest( x, table, tolerance = Inf, ppm = 0, duplicates = c("keep", "closest", "remove"), nomatch = NA_integer_ ) common( x, table, tolerance = Inf, ppm = 0, duplicates = c("keep", "closest", "remove") ) join(x, y, tolerance = 0, ppm = 0, type = c("outer", "left", "right", "inner"))
x |
|
table |
|
tolerance |
|
ppm |
|
duplicates |
|
nomatch |
|
y |
|
type |
|
For closest
/common
the tolerance
argument could be set to 0
to get
the same results as for match()
/%in%
. If it is set to Inf
(default)
the index of the closest values is returned without any restriction.
It is not guaranteed that there is a one-to-one matching for neither the
x
to table
nor the table
to x
matching.
If multiple elements in x
match a single element in table
all their
corresponding indices are returned if duplicates="keep"
is set (default).
This behaviour is identical to match()
. For duplicates="closest"
just
the closest element in x
gets the corresponding index in table
and
for duplicates="remove"
all elements in x
that match to the same element
in table
are set to nomatch
.
If a single element in x
matches multiple elements in table
the closest
is returned for duplicates="keep"
or duplicates="duplicates"
(keeping
multiple matches isn't possible in this case because the implementation relies
on findInterval
). If the differences between x
and the corresponding
matches in table
are identical the lower index (the smaller element
in table
) is returned. For duplicates="remove"
all multiple matches
are returned as nomatch
as above.
join
: joins two numeric
vectors by mapping values in x
with
values in y
and vice versa if they are similar enough (provided the
tolerance
and ppm
specified). The function returns a matrix
with the
indices of mapped values in x
and y
. Parameter type
allows to define
how the vectors will be joined: type = "left"
: values in x
will be
mapped to values in y
, elements in y
not matching any value in x
will
be discarded. type = "right"
: same as type = "left"
but for y
.
type = "outer"
: return matches for all values in x
and in y
.
type = "inner"
: report only indices of values that could be mapped.
closest
returns an integer
vector of the same length as x
giving the closest position in table
of the first match or nomatch
if
there is no match.
common
returns a logical
vector of length x
that is TRUE
if the
element in x
was found in table
. It is similar to %in%
.
join
returns a matrix
with two columns, namely x
and y
,
representing the index of the values in x
matching the corresponding value
in y
(or NA
if the value does not match).
closest
will replace all NA
values in x
by nomatch
(that is identical
to the behaviour of match
).
join
is based on closest(x, y, tolerance, duplicates = "closest")
.
That means for multiple matches just the closest one is reported.
Sebastian Gibb
Other grouping/matching functions:
bin()
## Define two vectors to match x <- c(1, 3, 5) y <- 1:10 ## Compare match and closest match(x, y) closest(x, y) ## If there is no exact match x <- x + 0.1 match(x, y) # no match closest(x, y) ## Some new values x <- c(1.11, 45.02, 556.45) y <- c(3.01, 34.12, 45.021, 46.1, 556.449) ## Using a single tolerance value closest(x, y, tolerance = 0.01) ## Using a value-specific tolerance accepting differences of 20 ppm closest(x, y, tolerance = ppm(y, 20)) ## Same using 50 ppm closest(x, y, tolerance = ppm(y, 50)) ## Sometimes multiple elements in `x` match to `table` x <- c(1.6, 1.75, 1.8) y <- 1:2 closest(x, y, tolerance = 0.5) closest(x, y, tolerance = 0.5, duplicates = "closest") closest(x, y, tolerance = 0.5, duplicates = "remove") ## Are there any common values? x <- c(1.6, 1.75, 1.8) y <- 1:2 common(x, y, tolerance = 0.5) common(x, y, tolerance = 0.5, duplicates = "closest") common(x, y, tolerance = 0.5, duplicates = "remove") ## Join two vectors x <- c(1, 2, 3, 6) y <- c(3, 4, 5, 6, 7) jo <- join(x, y, type = "outer") jo x[jo$x] y[jo$y] jl <- join(x, y, type = "left") jl x[jl$x] y[jl$y] jr <- join(x, y, type = "right") jr x[jr$x] y[jr$y] ji <- join(x, y, type = "inner") ji x[ji$x] y[ji$y]