Working with big.matrix Objects

Frédéric Bertrand

2025-10-05

Overview

bigalgebra is designed to interoperate with the bigmemory ecosystem. This vignette demonstrates how to create in-memory and file-backed big.matrix objects, interact with them via the package’s wrappers, and manage the underlying resources safely.

Creating in-memory big.matrix objects

In-memory matrices behave much like ordinary R matrices but reside in shared memory, allowing multiple R sessions to access the same data.

X <- big.matrix(3, 3, type = "double", init = 0)
X[,] <- matrix(1:9, nrow = 3)
X[]
#>      [,1] [,2] [,3]
#> [1,]    1    4    7
#> [2,]    2    5    8
#> [3,]    3    6    9

Once created, the objects can be passed directly to Level 1 helpers:

dvcal(ALPHA = 2, X = X, BETA = -1, Y = X)
X[]
#>      [,1] [,2] [,3]
#> [1,]    1    4    7
#> [2,]    2    5    8
#> [3,]    3    6    9

Working with file-backed matrices

File-backed matrices persist their contents on disk, making them suitable for data sets that exceed available RAM.

dir.create(tmp_fb <- tempfile())
Y <- filebacked.big.matrix(4, 2, type = "double",
                           backingpath = tmp_fb,
                           backingfile = "fb.bin",
                           descriptorfile = "fb.desc",
                           init = 0)
Y[,] <- matrix(runif(8), nrow = 4)
Y[]
#>            [,1]       [,2]
#> [1,] 0.87248600 0.80444244
#> [2,] 0.22247946 0.42258322
#> [3,] 0.02441364 0.08168393
#> [4,] 0.14834608 0.96116514

These objects participate in higher-level operations without being loaded into memory.

Z <- filebacked.big.matrix(4, 2, type = "double",
                           backingpath = tmp_fb,
                           backingfile = "res.bin",
                           descriptorfile = "res.desc",
                           init = 0)
dvcal(ALPHA = 1.5, X = Y, BETA = 0, Y = Z)
Z[]
#>            [,1]      [,2]
#> [1,] 1.30872901 1.2066637
#> [2,] 0.33371919 0.6338748
#> [3,] 0.03662046 0.1225259
#> [4,] 0.22251912 1.4417477

Sharing matrices between sessions

The descriptor file records the metadata needed to reopen a file-backed matrix in a new R session. The attach.big.matrix() helper reconstructs the object:

Y_desc <- dget(file.path(tmp_fb, "fb.desc"))
Y_again <- attach.big.matrix(Y_desc)
identical(Y[,], Y_again[,])
#> [1] TRUE

Any operations performed via bigalgebra update the shared backing file, allowing all attached references to observe the change.

dsub(X = Z, Y = Y_again)
Y_again[]
#>             [,1]        [,2]
#> [1,] -0.43624300 -0.40222122
#> [2,] -0.11123973 -0.21129161
#> [3,] -0.01220682 -0.04084196
#> [4,] -0.07417304 -0.48058257

Cleaning up backing files

File-backed matrices allocate resources on disk. Deleting the backing and descriptor files once they are no longer needed helps keep the workspace tidy.

unlink(file.path(tmp_fb, c("fb.bin", "fb.desc", "res.bin", "res.desc")))
unlink(tmp_fb, recursive = TRUE)