TileDBArray 1.17.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.53698218 0.28507248 0.08379683 . 1.09610166 0.21951613
## [2,] 0.93016681 -1.42859818 -1.17515321 . 0.33701857 -0.07358515
## [3,] 1.02423806 1.25282916 0.58694304 . 0.16294996 1.12714094
## [4,] 0.39947158 -0.84553191 0.79728406 . -0.48586462 -1.66152894
## [5,] 0.19689411 0.47097997 1.76941267 . -0.55652752 -0.61439958
## ... . . . . . .
## [96,] -1.9253107 -0.9481818 0.4495943 . 1.7227052 -0.7461758
## [97,] -0.6840933 2.0107446 0.7709267 . -0.4630639 0.8870057
## [98,] 0.7008943 1.0702701 0.3419474 . -0.4044978 -1.4017803
## [99,] 0.5753564 -1.2372411 0.4518201 . -1.6579663 -0.9429574
## [100,] 0.2555613 0.6287442 0.5600979 . -1.2250067 0.7502614
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.53698218 0.28507248 0.08379683 . 1.09610166 0.21951613
## [2,] 0.93016681 -1.42859818 -1.17515321 . 0.33701857 -0.07358515
## [3,] 1.02423806 1.25282916 0.58694304 . 0.16294996 1.12714094
## [4,] 0.39947158 -0.84553191 0.79728406 . -0.48586462 -1.66152894
## [5,] 0.19689411 0.47097997 1.76941267 . -0.55652752 -0.61439958
## ... . . . . . .
## [96,] -1.9253107 -0.9481818 0.4495943 . 1.7227052 -0.7461758
## [97,] -0.6840933 2.0107446 0.7709267 . -0.4630639 0.8870057
## [98,] 0.7008943 1.0702701 0.3419474 . -0.4044978 -1.4017803
## [99,] 0.5753564 -1.2372411 0.4518201 . -1.6579663 -0.9429574
## [100,] 0.2555613 0.6287442 0.5600979 . -1.2250067 0.7502614
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 0.53698218 0.28507248 0.08379683 . 1.09610166 0.21951613
## GENE_2 0.93016681 -1.42859818 -1.17515321 . 0.33701857 -0.07358515
## GENE_3 1.02423806 1.25282916 0.58694304 . 0.16294996 1.12714094
## GENE_4 0.39947158 -0.84553191 0.79728406 . -0.48586462 -1.66152894
## GENE_5 0.19689411 0.47097997 1.76941267 . -0.55652752 -0.61439958
## ... . . . . . .
## GENE_96 -1.9253107 -0.9481818 0.4495943 . 1.7227052 -0.7461758
## GENE_97 -0.6840933 2.0107446 0.7709267 . -0.4630639 0.8870057
## GENE_98 0.7008943 1.0702701 0.3419474 . -0.4044978 -1.4017803
## GENE_99 0.5753564 -1.2372411 0.4518201 . -1.6579663 -0.9429574
## GENE_100 0.2555613 0.6287442 0.5600979 . -1.2250067 0.7502614
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## 0.5369822 0.9301668 1.0242381 0.3994716 0.1968941 0.2764521
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 0.53698218 0.28507248 0.08379683 0.09469355 0.34201312
## GENE_2 0.93016681 -1.42859818 -1.17515321 0.46901941 -0.65425280
## GENE_3 1.02423806 1.25282916 0.58694304 1.23196401 1.09092471
## GENE_4 0.39947158 -0.84553191 0.79728406 1.26096545 -1.12941740
## GENE_5 0.19689411 0.47097997 1.76941267 1.15964458 1.64862246
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 1.0739644 0.5701450 0.1675937 . 2.1922033 0.4390323
## GENE_2 1.8603336 -2.8571964 -2.3503064 . 0.6740371 -0.1471703
## GENE_3 2.0484761 2.5056583 1.1738861 . 0.3258999 2.2542819
## GENE_4 0.7989432 -1.6910638 1.5945681 . -0.9717292 -3.3230579
## GENE_5 0.3937882 0.9419599 3.5388253 . -1.1130550 -1.2287992
## ... . . . . . .
## GENE_96 -3.8506214 -1.8963635 0.8991887 . 3.4454103 -1.4923516
## GENE_97 -1.3681865 4.0214891 1.5418535 . -0.9261278 1.7740115
## GENE_98 1.4017887 2.1405402 0.6838948 . -0.8089957 -2.8035606
## GENE_99 1.1507128 -2.4744822 0.9036401 . -3.3159326 -1.8859148
## GENE_100 0.5111226 1.2574883 1.1201959 . -2.4500133 1.5005228
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6
## 5.5317896 6.9423026 1.8846777 2.9893068 11.4986276 -0.4166244
## SAMP_7 SAMP_8 SAMP_9 SAMP_10
## -10.4167531 22.5350319 -14.7815845 3.8577699
out %*% runif(ncol(out))
## [,1]
## GENE_1 3.23073349
## GENE_2 -1.82709174
## GENE_3 3.66446431
## GENE_4 0.15788272
## GENE_5 1.01307490
## GENE_6 -0.85218493
## GENE_7 -0.58186165
## GENE_8 -0.35849486
## GENE_9 2.02958419
## GENE_10 -3.64892195
## GENE_11 0.50288198
## GENE_12 -2.91199763
## GENE_13 0.88230636
## GENE_14 1.11538079
## GENE_15 -3.68801898
## GENE_16 0.97049702
## GENE_17 -0.66167826
## GENE_18 -2.78009585
## GENE_19 2.45699686
## GENE_20 -0.34213980
## GENE_21 -0.36638485
## GENE_22 0.64562827
## GENE_23 2.11112753
## GENE_24 -2.14675181
## GENE_25 -1.92635775
## GENE_26 1.23319779
## GENE_27 1.58354405
## GENE_28 -0.14924815
## GENE_29 0.03129381
## GENE_30 -0.01447512
## GENE_31 -0.66701609
## GENE_32 0.23356588
## GENE_33 -1.13445256
## GENE_34 2.06200666
## GENE_35 1.13236353
## GENE_36 0.88652817
## GENE_37 -2.18395579
## GENE_38 0.44276490
## GENE_39 3.07211339
## GENE_40 0.70858303
## GENE_41 -1.29824585
## GENE_42 -0.03679132
## GENE_43 0.56594392
## GENE_44 1.65283079
## GENE_45 2.98054935
## GENE_46 -0.45452977
## GENE_47 -1.77538262
## GENE_48 4.33617753
## GENE_49 0.43052777
## GENE_50 0.05549453
## GENE_51 0.39971744
## GENE_52 -0.56951587
## GENE_53 -2.37599794
## GENE_54 1.06190147
## GENE_55 -3.09168042
## GENE_56 -1.60462838
## GENE_57 0.07644472
## GENE_58 2.45747547
## GENE_59 0.30650337
## GENE_60 -0.97546267
## GENE_61 3.18792862
## GENE_62 -0.91748401
## GENE_63 -1.36656436
## GENE_64 -0.77999361
## GENE_65 -0.74496776
## GENE_66 -0.94463052
## GENE_67 0.91983299
## GENE_68 -1.34499641
## GENE_69 2.20737262
## GENE_70 -0.41861429
## GENE_71 -2.90780281
## GENE_72 1.37469818
## GENE_73 -0.47322392
## GENE_74 1.58517869
## GENE_75 -0.05684846
## GENE_76 1.78424983
## GENE_77 0.05059131
## GENE_78 -1.46975428
## GENE_79 -1.63391861
## GENE_80 -3.28682795
## GENE_81 1.04449163
## GENE_82 1.12725110
## GENE_83 2.23143021
## GENE_84 -0.11487169
## GENE_85 0.70617656
## GENE_86 -3.55226239
## GENE_87 3.98748549
## GENE_88 3.30891126
## GENE_89 1.68499851
## GENE_90 1.57378687
## GENE_91 -1.99451336
## GENE_92 -1.81581454
## GENE_93 -2.53521806
## GENE_94 2.00779400
## GENE_95 -2.50866783
## GENE_96 1.75958502
## GENE_97 0.52280517
## GENE_98 -0.73948266
## GENE_99 -1.34029071
## GENE_100 -0.92888430
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.12010416 1.00627053 -0.77869592 . -0.1775560 -0.1733012
## [2,] -1.08439859 0.44384765 1.64838400 . -1.7830778 -0.2336533
## [3,] 1.08572954 -0.08575996 0.00768283 . -1.7554279 -2.5803992
## [4,] 0.62497040 -0.95079211 1.88688024 . -0.3242225 0.5259119
## [5,] 2.08983789 0.86808522 -0.59255378 . 1.2540021 -1.2275976
## ... . . . . . .
## [96,] -1.5089170 0.3755473 -2.2360359 . -1.4017352 -0.5791738
## [97,] 1.1929834 -0.4218359 -1.4107421 . 0.4490988 -0.5774638
## [98,] 0.8367407 0.8374742 -0.6956505 . -1.2552804 2.3289235
## [99,] -0.4831312 -1.0095038 0.3757922 . 0.4768857 -0.4167689
## [100,] 1.0102086 0.2935773 0.3156852 . -0.6678463 -0.8147124
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.12010416 1.00627053 -0.77869592 . -0.1775560 -0.1733012
## [2,] -1.08439859 0.44384765 1.64838400 . -1.7830778 -0.2336533
## [3,] 1.08572954 -0.08575996 0.00768283 . -1.7554279 -2.5803992
## [4,] 0.62497040 -0.95079211 1.88688024 . -0.3242225 0.5259119
## [5,] 2.08983789 0.86808522 -0.59255378 . 1.2540021 -1.2275976
## ... . . . . . .
## [96,] -1.5089170 0.3755473 -2.2360359 . -1.4017352 -0.5791738
## [97,] 1.1929834 -0.4218359 -1.4107421 . 0.4490988 -0.5774638
## [98,] 0.8367407 0.8374742 -0.6956505 . -1.2552804 2.3289235
## [99,] -0.4831312 -1.0095038 0.3757922 . 0.4768857 -0.4167689
## [100,] 1.0102086 0.2935773 0.3156852 . -0.6678463 -0.8147124
sessionInfo()
## R Under development (unstable) (2024-11-20 r87352)
## Platform: aarch64-apple-darwin20
## Running under: macOS Ventura 13.7.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.19 TileDBArray_1.17.0 DelayedArray_0.33.2
## [4] SparseArray_1.7.2 S4Arrays_1.7.1 IRanges_2.41.1
## [7] abind_1.4-8 S4Vectors_0.45.2 MatrixGenerics_1.19.0
## [10] matrixStats_1.4.1 BiocGenerics_0.53.3 generics_0.1.3
## [13] Matrix_1.7-1 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.5.0 jsonlite_1.8.9 compiler_4.5.0
## [4] BiocManager_1.30.25 crayon_1.5.3 Rcpp_1.0.13-1
## [7] nanoarrow_0.6.0 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-6 R6_2.5.1
## [13] RcppCCTZ_0.2.12 XVector_0.47.0 tiledb_0.30.2
## [16] knitr_1.49 bookdown_0.41 bslib_0.8.0
## [19] rlang_1.1.4 cachem_1.1.0 xfun_0.49
## [22] sass_0.4.9 bit64_4.5.2 cli_3.6.3
## [25] zlibbioc_1.53.0 spdl_0.0.5 digest_0.6.37
## [28] grid_4.5.0 lifecycle_1.0.4 data.table_1.16.2
## [31] evaluate_1.0.1 nanotime_0.3.10 zoo_1.8-12
## [34] rmarkdown_2.29 tools_4.5.0 htmltools_0.5.8.1