TileDBArray 1.6.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.69441413 -0.89356716 0.70794709 . 1.7329964 -0.2209475
## [2,] -0.39913616 0.20284771 1.47027169 . 0.8241806 -1.0015908
## [3,] 0.75683486 0.70993542 -0.27178810 . -0.8017653 0.5130396
## [4,] 0.41660110 -0.03034022 -1.18315944 . -2.0070948 -0.7106338
## [5,] -1.14173506 -0.54282714 -1.27580373 . 0.3859798 -0.1046265
## ... . . . . . .
## [96,] -0.28238262 -0.39614394 0.01779911 . 0.5433323 -0.7500110
## [97,] -0.63834074 1.29304773 -1.05042497 . 1.1398837 -0.4565904
## [98,] -0.32962015 -0.15908104 0.29793646 . -0.6624063 -1.2917231
## [99,] 1.13642029 0.28283655 -0.98980308 . 0.1868967 -0.2593291
## [100,] -1.82924487 -0.40146151 -1.51977565 . 1.4230089 -1.1479122
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.69441413 -0.89356716 0.70794709 . 1.7329964 -0.2209475
## [2,] -0.39913616 0.20284771 1.47027169 . 0.8241806 -1.0015908
## [3,] 0.75683486 0.70993542 -0.27178810 . -0.8017653 0.5130396
## [4,] 0.41660110 -0.03034022 -1.18315944 . -2.0070948 -0.7106338
## [5,] -1.14173506 -0.54282714 -1.27580373 . 0.3859798 -0.1046265
## ... . . . . . .
## [96,] -0.28238262 -0.39614394 0.01779911 . 0.5433323 -0.7500110
## [97,] -0.63834074 1.29304773 -1.05042497 . 1.1398837 -0.4565904
## [98,] -0.32962015 -0.15908104 0.29793646 . -0.6624063 -1.2917231
## [99,] 1.13642029 0.28283655 -0.98980308 . 0.1868967 -0.2593291
## [100,] -1.82924487 -0.40146151 -1.51977565 . 1.4230089 -1.1479122
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0.0 0.0 0.0 . 0.000 0.000
## [2,] 0.0 0.0 0.0 . 0.000 0.000
## [3,] 0.0 0.0 0.0 . 0.000 -0.072
## [4,] -1.2 0.0 0.0 . 0.000 0.000
## [5,] 0.0 0.0 0.0 . 0.000 0.000
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse matrix of class TileDBMatrix and type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> matrix of class TileDBMatrix and type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.69441413 -0.89356716 0.70794709 . 1.7329964 -0.2209475
## GENE_2 -0.39913616 0.20284771 1.47027169 . 0.8241806 -1.0015908
## GENE_3 0.75683486 0.70993542 -0.27178810 . -0.8017653 0.5130396
## GENE_4 0.41660110 -0.03034022 -1.18315944 . -2.0070948 -0.7106338
## GENE_5 -1.14173506 -0.54282714 -1.27580373 . 0.3859798 -0.1046265
## ... . . . . . .
## GENE_96 -0.28238262 -0.39614394 0.01779911 . 0.5433323 -0.7500110
## GENE_97 -0.63834074 1.29304773 -1.05042497 . 1.1398837 -0.4565904
## GENE_98 -0.32962015 -0.15908104 0.29793646 . -0.6624063 -1.2917231
## GENE_99 1.13642029 0.28283655 -0.98980308 . 0.1868967 -0.2593291
## GENE_100 -1.82924487 -0.40146151 -1.51977565 . 1.4230089 -1.1479122
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -0.6944141 -0.3991362 0.7568349 0.4166011 -1.1417351 0.1728057
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> matrix of class DelayedMatrix and type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -0.69441413 -0.89356716 0.70794709 -1.54516304 -0.95617442
## GENE_2 -0.39913616 0.20284771 1.47027169 1.25318455 2.38365555
## GENE_3 0.75683486 0.70993542 -0.27178810 -0.08447511 3.13491255
## GENE_4 0.41660110 -0.03034022 -1.18315944 0.31650540 -0.11549368
## GENE_5 -1.14173506 -0.54282714 -1.27580373 0.16729225 -1.11244687
out * 2
## <100 x 10> matrix of class DelayedMatrix and type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -1.38882826 -1.78713432 1.41589418 . 3.4659928 -0.4418951
## GENE_2 -0.79827231 0.40569541 2.94054338 . 1.6483613 -2.0031817
## GENE_3 1.51366971 1.41987084 -0.54357619 . -1.6035306 1.0260792
## GENE_4 0.83320221 -0.06068043 -2.36631889 . -4.0141895 -1.4212676
## GENE_5 -2.28347012 -1.08565427 -2.55160747 . 0.7719597 -0.2092529
## ... . . . . . .
## GENE_96 -0.56476524 -0.79228787 0.03559823 . 1.0866646 -1.5000220
## GENE_97 -1.27668148 2.58609547 -2.10084994 . 2.2797674 -0.9131808
## GENE_98 -0.65924029 -0.31816208 0.59587293 . -1.3248126 -2.5834461
## GENE_99 2.27284057 0.56567310 -1.97960617 . 0.3737933 -0.5186582
## GENE_100 -3.65848974 -0.80292302 -3.03955129 . 2.8460178 -2.2958243
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6
## -13.3259862 -9.5000462 0.5751075 -25.1201081 4.6038912 -7.5924370
## SAMP_7 SAMP_8 SAMP_9 SAMP_10
## 1.5388792 -10.8546571 -3.9219101 13.0277329
out %*% runif(ncol(out))
## <100 x 1> matrix of class DelayedMatrix and type "double":
## y
## GENE_1 -3.0488529
## GENE_2 0.7827595
## GENE_3 2.3204648
## GENE_4 0.7884562
## GENE_5 -2.0746746
## ... .
## GENE_96 1.9874895
## GENE_97 0.7447004
## GENE_98 -1.5807025
## GENE_99 1.3187040
## GENE_100 -6.0235163
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.573059735 -1.290823157 -0.083815326 . -0.47716608 -0.50636376
## [2,] -0.444925081 0.436438955 -1.018585813 . -0.28176479 -0.08034342
## [3,] 1.230521999 0.592342221 0.308098443 . 0.13533593 -0.50975817
## [4,] 0.008779687 2.269820629 -0.872745586 . 0.39464256 0.11344400
## [5,] 1.118893459 -0.720601234 1.453301664 . -1.48108994 -0.15767224
## ... . . . . . .
## [96,] 0.5512474 0.2106316 0.6796081 . -0.32988388 -0.26636884
## [97,] -1.0726616 0.3732007 -1.4569394 . -0.26663841 -1.64942701
## [98,] -0.3206768 1.0582249 -0.9604110 . -0.83784187 -0.31404047
## [99,] -0.8241132 -0.1606442 0.8477350 . 0.07948245 -1.22936413
## [100,] -2.4319622 -0.2759663 -0.8430878 . 0.23956352 0.48762661
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.573059735 -1.290823157 -0.083815326 . -0.47716608 -0.50636376
## [2,] -0.444925081 0.436438955 -1.018585813 . -0.28176479 -0.08034342
## [3,] 1.230521999 0.592342221 0.308098443 . 0.13533593 -0.50975817
## [4,] 0.008779687 2.269820629 -0.872745586 . 0.39464256 0.11344400
## [5,] 1.118893459 -0.720601234 1.453301664 . -1.48108994 -0.15767224
## ... . . . . . .
## [96,] 0.5512474 0.2106316 0.6796081 . -0.32988388 -0.26636884
## [97,] -1.0726616 0.3732007 -1.4569394 . -0.26663841 -1.64942701
## [98,] -0.3206768 1.0582249 -0.9604110 . -0.83784187 -0.31404047
## [99,] -0.8241132 -0.1606442 0.8477350 . 0.07948245 -1.22936413
## [100,] -2.4319622 -0.2759663 -0.8430878 . 0.23956352 0.48762661
sessionInfo()
## R version 4.2.0 RC (2022-04-19 r82224)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Mojave 10.14.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] TileDBArray_1.6.0 DelayedArray_0.22.0 IRanges_2.30.0
## [4] S4Vectors_0.34.0 MatrixGenerics_1.8.0 matrixStats_0.62.0
## [7] BiocGenerics_0.42.0 Matrix_1.4-1 BiocStyle_2.24.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.8.3 bslib_0.3.1 compiler_4.2.0
## [4] BiocManager_1.30.17 jquerylib_0.1.4 tools_4.2.0
## [7] digest_0.6.29 bit_4.0.4 jsonlite_1.8.0
## [10] evaluate_0.15 lattice_0.20-45 nanotime_0.3.6
## [13] rlang_1.0.2 cli_3.3.0 RcppCCTZ_0.2.10
## [16] yaml_2.3.5 xfun_0.30 fastmap_1.1.0
## [19] stringr_1.4.0 knitr_1.38 sass_0.4.1
## [22] bit64_4.0.5 grid_4.2.0 data.table_1.14.2
## [25] R6_2.5.1 rmarkdown_2.14 bookdown_0.26
## [28] tiledb_0.12.0 magrittr_2.0.3 htmltools_0.5.2
## [31] stringi_1.7.6 zoo_1.8-10