TileDBArray 1.3.1
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.0360676 0.3744472 -1.0564791 . -0.41842618 0.95408321
## [2,] -1.0662365 -0.5474545 1.0532145 . -0.32321808 0.10918604
## [3,] 0.8434454 -1.8229962 1.3212584 . -0.71556836 -0.41653630
## [4,] -0.8031428 1.2770974 1.6240101 . 0.31659514 -0.87901468
## [5,] -0.1161937 0.6504181 0.1868216 . -0.22750201 0.08961141
## ... . . . . . .
## [96,] 0.21262116 -1.29771368 -0.55246332 . -1.59864668 0.08823332
## [97,] 0.91433098 -1.41248728 0.09456357 . 1.33631515 0.84904689
## [98,] -0.47218324 0.49009490 -0.37229511 . -0.28267237 0.35219485
## [99,] -0.92775023 1.05477412 -2.11884464 . -0.33824640 -0.75901909
## [100,] 0.36600355 1.28971615 1.11731758 . -0.37917135 -0.56616184
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.0360676 0.3744472 -1.0564791 . -0.41842618 0.95408321
## [2,] -1.0662365 -0.5474545 1.0532145 . -0.32321808 0.10918604
## [3,] 0.8434454 -1.8229962 1.3212584 . -0.71556836 -0.41653630
## [4,] -0.8031428 1.2770974 1.6240101 . 0.31659514 -0.87901468
## [5,] -0.1161937 0.6504181 0.1868216 . -0.22750201 0.08961141
## ... . . . . . .
## [96,] 0.21262116 -1.29771368 -0.55246332 . -1.59864668 0.08823332
## [97,] 0.91433098 -1.41248728 0.09456357 . 1.33631515 0.84904689
## [98,] -0.47218324 0.49009490 -0.37229511 . -0.28267237 0.35219485
## [99,] -0.92775023 1.05477412 -2.11884464 . -0.33824640 -0.75901909
## [100,] 0.36600355 1.28971615 1.11731758 . -0.37917135 -0.56616184
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0.00 0.00 0.00 . 0 0
## [2,] 0.00 0.00 0.00 . 0 0
## [3,] 0.00 0.00 0.00 . 0 0
## [4,] 0.00 0.00 0.00 . 0 0
## [5,] 0.91 0.00 0.00 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse matrix of class TileDBMatrix and type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] TRUE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> matrix of class TileDBMatrix and type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -1.0360676 0.3744472 -1.0564791 . -0.41842618 0.95408321
## GENE_2 -1.0662365 -0.5474545 1.0532145 . -0.32321808 0.10918604
## GENE_3 0.8434454 -1.8229962 1.3212584 . -0.71556836 -0.41653630
## GENE_4 -0.8031428 1.2770974 1.6240101 . 0.31659514 -0.87901468
## GENE_5 -0.1161937 0.6504181 0.1868216 . -0.22750201 0.08961141
## ... . . . . . .
## GENE_96 0.21262116 -1.29771368 -0.55246332 . -1.59864668 0.08823332
## GENE_97 0.91433098 -1.41248728 0.09456357 . 1.33631515 0.84904689
## GENE_98 -0.47218324 0.49009490 -0.37229511 . -0.28267237 0.35219485
## GENE_99 -0.92775023 1.05477412 -2.11884464 . -0.33824640 -0.75901909
## GENE_100 0.36600355 1.28971615 1.11731758 . -0.37917135 -0.56616184
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -1.0360676 -1.0662365 0.8434454 -0.8031428 -0.1161937 0.9416509
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> matrix of class DelayedMatrix and type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -1.036067575 0.374447217 -1.056479102 0.828956879 0.008634165
## GENE_2 -1.066236533 -0.547454489 1.053214492 -1.516198506 -0.181377356
## GENE_3 0.843445427 -1.822996163 1.321258434 -0.299866363 0.221681180
## GENE_4 -0.803142753 1.277097367 1.624010097 0.306902205 -0.202705964
## GENE_5 -0.116193727 0.650418073 0.186821553 0.978123909 2.012026054
out * 2
## <100 x 10> matrix of class DelayedMatrix and type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -2.0721351 0.7488944 -2.1129582 . -0.8368524 1.9081664
## GENE_2 -2.1324731 -1.0949090 2.1064290 . -0.6464362 0.2183721
## GENE_3 1.6868909 -3.6459923 2.6425169 . -1.4311367 -0.8330726
## GENE_4 -1.6062855 2.5541947 3.2480202 . 0.6331903 -1.7580294
## GENE_5 -0.2323875 1.3008361 0.3736431 . -0.4550040 0.1792228
## ... . . . . . .
## GENE_96 0.4252423 -2.5954274 -1.1049266 . -3.1972934 0.1764666
## GENE_97 1.8286620 -2.8249746 0.1891271 . 2.6726303 1.6980938
## GENE_98 -0.9443665 0.9801898 -0.7445902 . -0.5653447 0.7043897
## GENE_99 -1.8555005 2.1095482 -4.2376893 . -0.6764928 -1.5180382
## GENE_100 0.7320071 2.5794323 2.2346352 . -0.7583427 -1.1323237
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7
## 3.315147 -11.833411 -14.881239 2.105538 -2.704835 -4.662177 1.086331
## SAMP_8 SAMP_9 SAMP_10
## -9.872623 2.636777 6.055228
out %*% runif(ncol(out))
## <100 x 1> matrix of class DelayedMatrix and type "double":
## y
## GENE_1 0.8529062
## GENE_2 -1.6958870
## GENE_3 -1.6009491
## GENE_4 1.0340724
## GENE_5 4.4175445
## ... .
## GENE_96 0.4956284
## GENE_97 -2.9212794
## GENE_98 -0.5382031
## GENE_99 -0.7048286
## GENE_100 0.9059813
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.798355115 0.857404837 -1.448103570 . 0.981489327 -0.627100184
## [2,] -0.378962460 -0.551250454 -0.008513743 . -0.490515481 0.006253462
## [3,] 1.770400951 -1.285586894 -1.413873559 . 0.407585076 0.527649165
## [4,] -0.589304725 0.632822267 0.548222736 . 0.223924278 0.413615971
## [5,] -1.131686891 0.681712937 0.957253547 . -0.272160257 -0.193099097
## ... . . . . . .
## [96,] 1.05497897 -0.93771127 -0.67206645 . 1.48945987 0.08722479
## [97,] 0.11076962 1.15679226 0.13046391 . 1.03211921 0.04615423
## [98,] 1.32610696 -0.58197460 -1.79939559 . 0.32870783 1.07064449
## [99,] 0.03906207 0.46179307 0.91934862 . 1.06859486 0.27934284
## [100,] -0.92575968 0.12530082 1.17612493 . 0.09061471 -0.09535175
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.798355115 0.857404837 -1.448103570 . 0.981489327 -0.627100184
## [2,] -0.378962460 -0.551250454 -0.008513743 . -0.490515481 0.006253462
## [3,] 1.770400951 -1.285586894 -1.413873559 . 0.407585076 0.527649165
## [4,] -0.589304725 0.632822267 0.548222736 . 0.223924278 0.413615971
## [5,] -1.131686891 0.681712937 0.957253547 . -0.272160257 -0.193099097
## ... . . . . . .
## [96,] 1.05497897 -0.93771127 -0.67206645 . 1.48945987 0.08722479
## [97,] 0.11076962 1.15679226 0.13046391 . 1.03211921 0.04615423
## [98,] 1.32610696 -0.58197460 -1.79939559 . 0.32870783 1.07064449
## [99,] 0.03906207 0.46179307 0.91934862 . 1.06859486 0.27934284
## [100,] -0.92575968 0.12530082 1.17612493 . 0.09061471 -0.09535175
sessionInfo()
## R version 4.1.0 beta (2021-05-03 r80259)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] TileDBArray_1.3.1 DelayedArray_0.19.0 IRanges_2.27.0
## [4] S4Vectors_0.31.0 MatrixGenerics_1.5.0 matrixStats_0.58.0
## [7] BiocGenerics_0.39.0 Matrix_1.3-3 BiocStyle_2.21.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.6 bslib_0.2.5.1 compiler_4.1.0
## [4] BiocManager_1.30.15 jquerylib_0.1.4 tools_4.1.0
## [7] digest_0.6.27 bit_4.0.4 jsonlite_1.7.2
## [10] evaluate_0.14 lattice_0.20-44 nanotime_0.3.2
## [13] rlang_0.4.11 RcppCCTZ_0.2.9 yaml_2.2.1
## [16] xfun_0.23 stringr_1.4.0 knitr_1.33
## [19] sass_0.4.0 bit64_4.0.5 grid_4.1.0
## [22] R6_2.5.0 rmarkdown_2.8 bookdown_0.22
## [25] tiledb_0.9.2 magrittr_2.0.1 htmltools_0.5.1.1
## [28] stringi_1.6.2 zoo_1.8-9