| Type: | Package |
| Title: | Exact Search and Graph Construction for 'bigmemory' Matrices |
| Version: | 0.3.0 |
| Date: | 2026-03-26 |
| Author: | Frederic Bertrand [aut, cre] |
| Maintainer: | Frederic Bertrand <frederic.bertrand@lecnam.net> |
| Description: | Exact nearest-neighbour and radius-search routines that operate directly on 'bigmemory::big.matrix' objects. The package streams row blocks through 'BLAS' kernels, supports self-search and external-query search, exposes prepared references for repeated queries, and can build exact k-nearest-neighbour, radius, mutual k-nearest-neighbour, and shared-nearest-neighbour graphs. Version 0.3.0 adds execution plans, serializable prepared caches, resumable streamed graph jobs, coercion helpers, exact candidate reranking, and recall summaries for evaluating approximate neighbours. |
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
| Depends: | R (≥ 3.5.0) |
| Imports: | Matrix, Rcpp (≥ 1.0.12), methods |
| Suggests: | bigmemory, knitr, rmarkdown, testthat (≥ 3.0.0) |
| LinkingTo: | Rcpp, bigmemory, BH |
| Encoding: | UTF-8 |
| NeedsCompilation: | yes |
| URL: | https://fbertran.github.io/bigKNN/, https://github.com/fbertran/bigKNN |
| BugReports: | https://github.com/fbertran/bigKNN/issues |
| RoxygenNote: | 7.3.3 |
| VignetteBuilder: | knitr |
| Config/testthat/edition: | 3 |
| Packaged: | 2026-03-27 01:16:22 UTC; bertran7 |
| Repository: | CRAN |
| Date/Publication: | 2026-04-01 08:00:26 UTC |
bigKNN: Exact Search and Graph Construction for bigmemory Matrices
Description
Exact nearest-neighbour and radius-search routines for
bigmemory::big.matrix references, with in-memory results, streamed
writes into destination big.matrix objects, prepared references for
repeated queries, and exact graph builders.
Details
Package options:
-
bigKNN.block_size: default number of rows processed per query/reference block. Defaults to1024L. -
bigKNN.progress: logical flag controlling simple progress messages while computing neighbours. Defaults toFALSE.
Author(s)
Maintainer: Frederic Bertrand frederic.bertrand@lecnam.net
See Also
Useful links:
Report bugs at https://github.com/fbertran/bigKNN/issues
Coerce bigKNN outputs to edge-list form
Description
Coerce bigKNN outputs to edge-list form
Usage
as_edge_list(x, include_distance = TRUE)
Arguments
x |
A bigKNN result or graph object. |
include_distance |
Logical flag controlling whether distances are kept when coercing raw kNN or radius results. |
Value
A data frame with columns from, to, and either distance or
weight.
Coerce bigKNN outputs to a sparse matrix
Description
Coerce bigKNN outputs to a sparse matrix
Usage
as_sparse_matrix(x, include_distance = TRUE)
Arguments
x |
A bigKNN result or graph object. |
include_distance |
Logical flag controlling whether distances are kept when coercing raw kNN or radius results. |
Value
A Matrix::dgCMatrix.
Coerce bigKNN outputs to sparse-triplet form
Description
Coerce bigKNN outputs to sparse-triplet form
Usage
as_triplet(x, include_distance = TRUE)
Arguments
x |
A bigKNN result or graph object. |
include_distance |
Logical flag controlling whether distances are kept when coercing raw kNN or radius results. |
Value
A triplet list with components i, j, x, and Dim.
Count neighbours within a fixed radius
Description
Count neighbours within a fixed radius
Usage
count_within_radius_bigmatrix(
x,
query = NULL,
radius,
metric = "euclidean",
block_size = knn_default_block_size(),
plan = NULL,
exclude_self = is.null(query)
)
Arguments
x |
A |
query |
Optional query source. Supply |
radius |
Distance threshold for including a neighbour. |
metric |
Distance metric. Supported values are |
block_size |
Number of rows to process per query and reference block. |
plan |
Optional execution plan returned by |
exclude_self |
Logical flag controlling whether a query row may return
itself as a neighbour when |
Value
An integer vector with one count per query row.
Exact k-nearest neighbours for bigmemory::big.matrix
Description
Exact k-nearest neighbours for bigmemory::big.matrix
Usage
knn_bigmatrix(
x,
query = NULL,
k = 10L,
metric = "euclidean",
block_size = knn_default_block_size(),
plan = NULL,
exclude_self = is.null(query)
)
Arguments
x |
A |
query |
Optional query source. Supply |
k |
Number of neighbours to return. |
metric |
Distance metric. Supported values are |
block_size |
Number of rows to process per query and reference block. |
plan |
Optional execution plan returned by |
exclude_self |
Logical flag controlling whether a query row may return
itself as a neighbour when |
Value
A list with components index, distance, k, metric, n_ref,
n_query, exact, and backend.
Build an exact kNN graph from a bigmemory::big.matrix
Description
Build an exact kNN graph from a bigmemory::big.matrix
Usage
knn_graph_bigmatrix(
x,
k = 10L,
metric = "euclidean",
block_size = knn_default_block_size(),
plan = NULL,
include_distance = TRUE,
format = c("edge_list", "triplet", "dgCMatrix"),
symmetrize = c("none", "union", "mutual"),
exclude_self = TRUE
)
Arguments
x |
A |
k |
Number of neighbours per row. |
metric |
Distance metric. Supported values are |
block_size |
Number of rows to process per query and reference block. |
plan |
Optional execution plan returned by |
include_distance |
Logical flag controlling whether kNN graph edges store distances or unit weights. |
format |
Output format. One of |
symmetrize |
How directed kNN edges should be combined. One of
|
exclude_self |
Logical flag controlling whether self loops are suppressed in the directed kNN graph. |
Value
An edge list, a triplet list, or a Matrix::dgCMatrix, depending on
the requested format.
Stream a directed exact kNN graph into destination big.matrix objects
Description
Stream a directed exact kNN graph into destination big.matrix objects
Usage
knn_graph_stream_bigmatrix(
x,
k,
xpFrom,
xpTo,
xpValue = NULL,
metric = "euclidean",
plan = NULL,
block_size = knn_default_block_size(),
include_distance = TRUE,
checkpoint_path = NULL
)
Arguments
x |
A |
k |
Number of neighbours per row. |
xpFrom |
Writable single-column |
xpTo |
Writable single-column |
xpValue |
Optional writable single-column |
metric |
Distance metric. Supported values are |
plan |
Optional execution plan returned by |
block_size |
Number of rows to process per query and reference block. |
include_distance |
Logical flag controlling whether |
checkpoint_path |
Optional path for a resumable job checkpoint. |
Value
An object of class "bigknn_job".
Load a serialized prepared reference
Description
Load a serialized prepared reference
Usage
knn_load_prepared(cache_path)
Arguments
cache_path |
Path previously written by |
Value
An object of class "bigknn_prepared".
Build an execution plan for exact search
Description
Build an execution plan for exact search
Usage
knn_plan_bigmatrix(
x,
metric = "euclidean",
memory_budget = "2GB",
num_threads = getOption("bigKNN.num_threads", 1L),
progress = getOption("bigKNN.progress", interactive())
)
Arguments
x |
A |
metric |
Distance metric. Supported values are |
memory_budget |
Memory budget expressed in bytes or a compact size string
such as |
num_threads |
Requested thread count forwarded to common BLAS/OpenMP environment variables during execution. |
progress |
Logical flag controlling progress reporting for plan-aware calls. |
Value
An object of class "bigknn_plan".
Prepare a bigmemory::big.matrix reference for repeated exact search
Description
Prepare a bigmemory::big.matrix reference for repeated exact search
Usage
knn_prepare_bigmatrix(
x,
metric = "euclidean",
block_size = knn_default_block_size(),
plan = NULL,
validate = TRUE,
cache_path = NULL
)
Arguments
x |
A |
metric |
Distance metric. Supported values are |
block_size |
Number of rows to process per query and reference block. |
plan |
Optional execution plan returned by |
validate |
Logical flag controlling whether the preparation pass checks for finite, metric-compatible rows while building the cache. |
cache_path |
Optional path where a serializable prepared-reference cache
should be written with |
Value
An object of class "bigknn_prepared" containing the reference
pointer, metric-specific row cache, and metadata reused by later exact
search calls.
Search a prepared exact reference
Description
Search a prepared exact reference
Usage
knn_search_prepared(
ref,
query = NULL,
k = 10L,
block_size = knn_default_block_size(),
plan = NULL,
exclude_self = is.null(query)
)
Arguments
ref |
A prepared reference returned by |
query |
Optional query source. Supply |
k |
Number of neighbours to return. |
block_size |
Number of rows to process per query and reference block. |
plan |
Optional execution plan returned by |
exclude_self |
Logical flag controlling whether a query row may return
itself as a neighbour when |
Value
A list with components index, distance, k, metric, n_ref,
n_query, exact, and backend.
Stream prepared exact search results into destination big.matrix objects
Description
Stream prepared exact search results into destination big.matrix objects
Usage
knn_search_stream_prepared(
ref,
query = NULL,
xpIndex,
xpDistance = NULL,
k = 10L,
block_size = knn_default_block_size(),
plan = NULL,
exclude_self = is.null(query)
)
Arguments
ref |
A prepared reference returned by |
query |
Optional query source. Supply |
xpIndex |
A writable |
xpDistance |
Optional writable |
k |
Number of neighbours to return. |
block_size |
Number of rows to process per query and reference block. |
plan |
Optional execution plan returned by |
exclude_self |
Logical flag controlling whether a query row may return
itself as a neighbour when |
Value
A list with components index, distance, k, metric, n_ref,
n_query, exact, and backend. The index and distance entries
reference the supplied destination objects.
Stream exact k-nearest neighbours into destination big.matrix objects
Description
Stream exact k-nearest neighbours into destination big.matrix objects
Usage
knn_stream_bigmatrix(
x,
query = NULL,
xpIndex,
xpDistance = NULL,
k = 10L,
metric = "euclidean",
block_size = knn_default_block_size(),
plan = NULL,
exclude_self = is.null(query)
)
Arguments
x |
A |
query |
Optional query source. Supply |
xpIndex |
A writable |
xpDistance |
Optional writable |
k |
Number of neighbours to return. |
metric |
Distance metric. Supported values are |
block_size |
Number of rows to process per query and reference block. |
plan |
Optional execution plan returned by |
exclude_self |
Logical flag controlling whether a query row may return
itself as a neighbour when |
Value
A list with components index, distance, k, metric, n_ref,
n_query, exact, and backend. The index and distance entries
reference the supplied destination objects.
Validate a prepared reference
Description
Validate a prepared reference
Usage
knn_validate_prepared(ref)
Arguments
ref |
A prepared reference returned by |
Value
Invisibly returns TRUE when the prepared reference is valid.
Build an exact mutual kNN graph from a bigmemory::big.matrix
Description
Build an exact mutual kNN graph from a bigmemory::big.matrix
Usage
mutual_knn_graph_bigmatrix(
x,
k = 10L,
metric = "euclidean",
block_size = knn_default_block_size(),
plan = NULL,
include_distance = TRUE,
format = c("edge_list", "triplet", "dgCMatrix")
)
Arguments
x |
A |
k |
Number of neighbours per row. |
metric |
Distance metric. Supported values are |
block_size |
Number of rows to process per query and reference block. |
plan |
Optional execution plan returned by |
include_distance |
Logical flag controlling whether graph edges store distances or unit weights. |
format |
Output format. One of |
Value
An edge list, a triplet list, or a Matrix::dgCMatrix, depending on
the requested format.
Exact radius search for bigmemory::big.matrix
Description
Exact radius search for bigmemory::big.matrix
Usage
radius_bigmatrix(
x,
query = NULL,
radius,
metric = "euclidean",
block_size = knn_default_block_size(),
plan = NULL,
exclude_self = is.null(query),
sort = TRUE
)
Arguments
x |
A |
query |
Optional query source. Supply |
radius |
Distance threshold for including a neighbour. |
metric |
Distance metric. Supported values are |
block_size |
Number of rows to process per query and reference block. |
plan |
Optional execution plan returned by |
exclude_self |
Logical flag controlling whether a query row may return
itself as a neighbour when |
sort |
Logical flag controlling whether each query's matches are sorted by distance and then by index. |
Value
A list with components index, distance, offset, n_match,
radius, metric, n_ref, n_query, exact, and backend.
Build an exact radius graph from a bigmemory::big.matrix
Description
Build an exact radius graph from a bigmemory::big.matrix
Usage
radius_graph_bigmatrix(
x,
radius,
metric = "euclidean",
plan = NULL,
block_size = knn_default_block_size(),
include_distance = TRUE,
format = c("edge_list", "triplet", "dgCMatrix"),
symmetrize = c("none", "union", "mutual"),
exclude_self = TRUE,
sort = TRUE
)
Arguments
x |
A |
radius |
Distance threshold for including an edge. |
metric |
Distance metric. Supported values are |
plan |
Optional execution plan returned by |
block_size |
Number of rows to process per query and reference block. |
include_distance |
Logical flag controlling whether graph edges store distances or unit weights. |
format |
Output format. One of |
symmetrize |
How directed radius edges should be combined. One of
|
exclude_self |
Logical flag controlling whether self loops are suppressed. |
sort |
Logical flag controlling whether each query's matches are sorted by distance and then by index. |
Value
A graph representation in the requested format.
Stream exact radius-search results into destination big.matrix objects
Description
Stream exact radius-search results into destination big.matrix objects
Usage
radius_stream_bigmatrix(
x,
query = NULL,
xpIndex,
xpDistance = NULL,
xpOffset,
radius,
metric = "euclidean",
block_size = knn_default_block_size(),
plan = NULL,
exclude_self = is.null(query),
sort = TRUE
)
Arguments
x |
A |
query |
Optional query source. Supply |
xpIndex |
A writable single-column |
xpDistance |
Optional writable single-column |
xpOffset |
A writable single-column |
radius |
Distance threshold for including a neighbour. |
metric |
Distance metric. Supported values are |
block_size |
Number of rows to process per query and reference block. |
plan |
Optional execution plan returned by |
exclude_self |
Logical flag controlling whether a query row may return
itself as a neighbour when |
sort |
Logical flag controlling whether each query's matches are sorted by distance and then by index. |
Value
A list with components index, distance, offset, n_match,
radius, metric, n_ref, n_query, exact, and backend. The
index, distance, and offset entries reference the supplied
destination objects.
Stream exact radius-search results into destination big.matrix objects with checkpoints
Description
Stream exact radius-search results into destination big.matrix objects with checkpoints
Usage
radius_stream_job_bigmatrix(
x,
query = NULL,
xpIndex,
xpDistance = NULL,
xpOffset,
radius,
metric = "euclidean",
plan = NULL,
block_size = knn_default_block_size(),
exclude_self = is.null(query),
sort = TRUE,
checkpoint_path = NULL
)
Arguments
x |
A |
query |
Optional query source. Supply |
xpIndex |
A writable single-column |
xpDistance |
Optional writable single-column |
xpOffset |
A writable single-column |
radius |
Distance threshold for including a neighbour. |
metric |
Distance metric. Supported values are |
plan |
Optional execution plan returned by |
block_size |
Number of query rows to process per block. |
exclude_self |
Logical flag controlling whether self matches are removed
when |
sort |
Logical flag controlling whether each query's matches are sorted by distance and then by index. |
checkpoint_path |
Optional path for a resumable job checkpoint. |
Value
An object of class "bigknn_job".
Compare approximate neighbours to exact truth
Description
Compare approximate neighbours to exact truth
Usage
recall_against_exact(exact, approx_index, k = NULL)
Arguments
exact |
Exact kNN output or index matrix. |
approx_index |
Approximate neighbour index matrix or result object. |
k |
Optional number of neighbours to compare. |
Value
An object of class "bigknn_recall".
Rerank candidate neighbours exactly against a bigmemory::big.matrix
Description
Rerank candidate neighbours exactly against a bigmemory::big.matrix
Usage
rerank_candidates_bigmatrix(
x,
query,
candidate_index,
metric = "euclidean",
top_k = NULL,
plan = NULL,
block_size = knn_default_block_size(),
exclude_self = is.null(query)
)
Arguments
x |
A |
query |
Query source. Supply |
candidate_index |
Candidate neighbour indices supplied as a matrix,
|
metric |
Distance metric. Supported values are |
top_k |
Number of reranked neighbours to return. Defaults to all supplied candidate columns. |
plan |
Optional execution plan returned by |
block_size |
Number of query rows to process at a time. |
exclude_self |
Logical flag controlling whether self ids are removed
when |
Value
An object of class "bigknn_knn_result".
Resume a checkpointed bigKNN job
Description
Resume a checkpointed bigKNN job
Usage
resume_knn_job(checkpoint_path)
Arguments
checkpoint_path |
Path previously created by
|
Value
An object of class "bigknn_job".
Build an exact shared-nearest-neighbour graph from a bigmemory::big.matrix
Description
Build an exact shared-nearest-neighbour graph from a bigmemory::big.matrix
Usage
snn_graph_bigmatrix(
x,
k = 10L,
metric = "euclidean",
block_size = knn_default_block_size(),
plan = NULL,
weight = c("count", "jaccard"),
format = c("edge_list", "triplet", "dgCMatrix")
)
Arguments
x |
A |
k |
Number of neighbours per row in the underlying exact kNN search. |
metric |
Distance metric. Supported values are |
block_size |
Number of rows to process per query and reference block. |
plan |
Optional execution plan returned by |
weight |
Shared-nearest-neighbour weight definition. One of |
format |
Output format. One of |
Value
An edge list, a triplet list, or a Matrix::dgCMatrix, depending on
the requested format.