TileDBArray 1.3.1
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -2.090418119 0.234348929 -0.580814429 . 0.1483310 1.1496815
## [2,] 1.548579468 -0.472228287 -0.657196768 . 0.2595357 -0.5763910
## [3,] 1.780122789 0.754787551 0.005715742 . 0.3883064 -0.5232770
## [4,] -0.590968905 -0.327780057 0.039647740 . 1.1296582 -1.4300201
## [5,] 0.433366937 -0.290613825 0.893050535 . 0.1824566 0.3455577
## ... . . . . . .
## [96,] -0.23311444 -0.04386422 0.63787198 . 0.3754268 -0.9607307
## [97,] -0.82067866 0.47420710 -0.17152746 . 1.1574546 0.5542008
## [98,] -0.54380085 0.60042304 0.90257128 . 0.6556889 -0.9370780
## [99,] 2.01265950 -0.38415487 0.04795385 . -0.3496137 0.7939744
## [100,] -1.00470036 1.72331360 0.20149014 . -0.3991838 0.1248207
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -2.090418119 0.234348929 -0.580814429 . 0.1483310 1.1496815
## [2,] 1.548579468 -0.472228287 -0.657196768 . 0.2595357 -0.5763910
## [3,] 1.780122789 0.754787551 0.005715742 . 0.3883064 -0.5232770
## [4,] -0.590968905 -0.327780057 0.039647740 . 1.1296582 -1.4300201
## [5,] 0.433366937 -0.290613825 0.893050535 . 0.1824566 0.3455577
## ... . . . . . .
## [96,] -0.23311444 -0.04386422 0.63787198 . 0.3754268 -0.9607307
## [97,] -0.82067866 0.47420710 -0.17152746 . 1.1574546 0.5542008
## [98,] -0.54380085 0.60042304 0.90257128 . 0.6556889 -0.9370780
## [99,] 2.01265950 -0.38415487 0.04795385 . -0.3496137 0.7939744
## [100,] -1.00470036 1.72331360 0.20149014 . -0.3991838 0.1248207
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0.0 0.0 0.0 . 0 0
## [2,] 0.0 0.0 0.0 . 0 0
## [3,] 0.0 0.0 0.0 . 0 0
## [4,] 0.0 0.0 0.0 . 0 0
## [5,] 0.0 0.0 -1.1 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse matrix of class TileDBMatrix and type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> matrix of class TileDBMatrix and type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -2.090418119 0.234348929 -0.580814429 . 0.1483310 1.1496815
## GENE_2 1.548579468 -0.472228287 -0.657196768 . 0.2595357 -0.5763910
## GENE_3 1.780122789 0.754787551 0.005715742 . 0.3883064 -0.5232770
## GENE_4 -0.590968905 -0.327780057 0.039647740 . 1.1296582 -1.4300201
## GENE_5 0.433366937 -0.290613825 0.893050535 . 0.1824566 0.3455577
## ... . . . . . .
## GENE_96 -0.23311444 -0.04386422 0.63787198 . 0.3754268 -0.9607307
## GENE_97 -0.82067866 0.47420710 -0.17152746 . 1.1574546 0.5542008
## GENE_98 -0.54380085 0.60042304 0.90257128 . 0.6556889 -0.9370780
## GENE_99 2.01265950 -0.38415487 0.04795385 . -0.3496137 0.7939744
## GENE_100 -1.00470036 1.72331360 0.20149014 . -0.3991838 0.1248207
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -2.0904181 1.5485795 1.7801228 -0.5909689 0.4333669 -1.3054008
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> matrix of class DelayedMatrix and type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -2.090418119 0.234348929 -0.580814429 -1.568471487 0.579623810
## GENE_2 1.548579468 -0.472228287 -0.657196768 -0.264028703 -1.245299609
## GENE_3 1.780122789 0.754787551 0.005715742 0.431693675 -2.561762577
## GENE_4 -0.590968905 -0.327780057 0.039647740 0.643374646 -0.173200608
## GENE_5 0.433366937 -0.290613825 0.893050535 1.720347975 1.101935372
out * 2
## <100 x 10> matrix of class DelayedMatrix and type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -4.18083624 0.46869786 -1.16162886 . 0.2966621 2.2993630
## GENE_2 3.09715894 -0.94445657 -1.31439354 . 0.5190714 -1.1527819
## GENE_3 3.56024558 1.50957510 0.01143148 . 0.7766128 -1.0465540
## GENE_4 -1.18193781 -0.65556011 0.07929548 . 2.2593163 -2.8600403
## GENE_5 0.86673387 -0.58122765 1.78610107 . 0.3649133 0.6911155
## ... . . . . . .
## GENE_96 -0.46622888 -0.08772844 1.27574396 . 0.7508535 -1.9214614
## GENE_97 -1.64135732 0.94841421 -0.34305492 . 2.3149093 1.1084016
## GENE_98 -1.08760170 1.20084608 1.80514257 . 1.3113778 -1.8741561
## GENE_99 4.02531901 -0.76830975 0.09590769 . -0.6992274 1.5879488
## GENE_100 -2.00940071 3.44662720 0.40298029 . -0.7983676 0.2496415
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6
## 5.63558265 3.69370029 3.28107362 7.17926524 -1.60597142 -18.96231968
## SAMP_7 SAMP_8 SAMP_9 SAMP_10
## 19.40612142 7.72789208 16.31541602 -0.05279476
out %*% runif(ncol(out))
## <100 x 1> matrix of class DelayedMatrix and type "double":
## y
## GENE_1 -4.01745163
## GENE_2 -0.04041403
## GENE_3 1.85886144
## GENE_4 0.14931919
## GENE_5 3.08457195
## ... .
## GENE_96 -1.55728587
## GENE_97 -0.09446832
## GENE_98 -0.39698727
## GENE_99 1.17839821
## GENE_100 -1.20973608
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.4221082 -0.3834877 -0.1779621 . 0.02886053 0.47873942
## [2,] 1.1784478 0.6837324 -1.6826929 . 0.78549959 -0.11870405
## [3,] 1.6857803 -0.8067439 -0.9190278 . -1.50405062 -0.78084419
## [4,] -0.5439206 -0.6648650 1.2053137 . -0.96329600 1.38948523
## [5,] 0.1447337 -0.1894624 0.5201243 . -0.77705921 -0.49365267
## ... . . . . . .
## [96,] 0.74709299 0.82894321 0.25044536 . 1.33118055 1.73336790
## [97,] 0.92357376 -0.15989605 -0.98441570 . -0.20360497 0.50657382
## [98,] 0.57206792 1.00559772 -0.03546482 . 0.07469935 1.66081945
## [99,] 0.30252978 -0.80167189 -0.37849397 . -0.17658127 -1.66322973
## [100,] 1.32867265 -0.53701988 0.16438683 . 0.20716926 -0.69480133
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.4221082 -0.3834877 -0.1779621 . 0.02886053 0.47873942
## [2,] 1.1784478 0.6837324 -1.6826929 . 0.78549959 -0.11870405
## [3,] 1.6857803 -0.8067439 -0.9190278 . -1.50405062 -0.78084419
## [4,] -0.5439206 -0.6648650 1.2053137 . -0.96329600 1.38948523
## [5,] 0.1447337 -0.1894624 0.5201243 . -0.77705921 -0.49365267
## ... . . . . . .
## [96,] 0.74709299 0.82894321 0.25044536 . 1.33118055 1.73336790
## [97,] 0.92357376 -0.15989605 -0.98441570 . -0.20360497 0.50657382
## [98,] 0.57206792 1.00559772 -0.03546482 . 0.07469935 1.66081945
## [99,] 0.30252978 -0.80167189 -0.37849397 . -0.17658127 -1.66322973
## [100,] 1.32867265 -0.53701988 0.16438683 . 0.20716926 -0.69480133
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows Server x64 (build 17763)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=C
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] TileDBArray_1.3.1 DelayedArray_0.19.0 IRanges_2.27.0
## [4] S4Vectors_0.31.0 MatrixGenerics_1.5.0 matrixStats_0.58.0
## [7] BiocGenerics_0.39.0 Matrix_1.3-3 BiocStyle_2.21.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.6 bslib_0.2.5.1 compiler_4.1.0
## [4] BiocManager_1.30.15 jquerylib_0.1.4 tools_4.1.0
## [7] digest_0.6.27 bit_4.0.4 jsonlite_1.7.2
## [10] evaluate_0.14 lattice_0.20-44 nanotime_0.3.2
## [13] rlang_0.4.11 RcppCCTZ_0.2.9 yaml_2.2.1
## [16] xfun_0.23 stringr_1.4.0 knitr_1.33
## [19] sass_0.4.0 bit64_4.0.5 grid_4.1.0
## [22] R6_2.5.0 rmarkdown_2.8 bookdown_0.22
## [25] tiledb_0.9.2 magrittr_2.0.1 htmltools_0.5.1.1
## [28] stringi_1.6.2 zoo_1.8-9