TileDBArray 1.14.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.2305772 -0.3230783 -1.4562068 . 0.398774831 -0.270536417
## [2,] 0.8084421 0.4981961 -0.7206921 . 0.116513681 0.570161853
## [3,] 0.8119825 2.4713495 0.2217185 . -1.775481636 0.005997816
## [4,] 0.1790890 -1.3572858 1.2675452 . 1.696324104 0.875489885
## [5,] -0.9601161 0.8556173 -0.1378519 . -0.327492960 -0.730440302
## ... . . . . . .
## [96,] 0.79716889 0.63706221 0.09266665 . -0.02979687 0.60814939
## [97,] -0.66800965 0.16354165 -1.04468339 . -1.00955258 -0.89323768
## [98,] -1.46935762 -0.24167793 -0.62951057 . -0.10774194 -0.56974857
## [99,] 1.28597562 -0.05039254 -1.38589267 . -0.50287017 0.97855660
## [100,] 0.44691326 1.34608195 1.06211163 . 0.12302216 1.83755694
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.2305772 -0.3230783 -1.4562068 . 0.398774831 -0.270536417
## [2,] 0.8084421 0.4981961 -0.7206921 . 0.116513681 0.570161853
## [3,] 0.8119825 2.4713495 0.2217185 . -1.775481636 0.005997816
## [4,] 0.1790890 -1.3572858 1.2675452 . 1.696324104 0.875489885
## [5,] -0.9601161 0.8556173 -0.1378519 . -0.327492960 -0.730440302
## ... . . . . . .
## [96,] 0.79716889 0.63706221 0.09266665 . -0.02979687 0.60814939
## [97,] -0.66800965 0.16354165 -1.04468339 . -1.00955258 -0.89323768
## [98,] -1.46935762 -0.24167793 -0.62951057 . -0.10774194 -0.56974857
## [99,] 1.28597562 -0.05039254 -1.38589267 . -0.50287017 0.97855660
## [100,] 0.44691326 1.34608195 1.06211163 . 0.12302216 1.83755694
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0.0 0.0
## [2,] 0 0 0 . 0.0 0.0
## [3,] 0 0 0 . 0.0 0.0
## [4,] 0 0 0 . -2.2 0.0
## [5,] 0 0 0 . 0.0 0.0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 0.2305772 -0.3230783 -1.4562068 . 0.398774831 -0.270536417
## GENE_2 0.8084421 0.4981961 -0.7206921 . 0.116513681 0.570161853
## GENE_3 0.8119825 2.4713495 0.2217185 . -1.775481636 0.005997816
## GENE_4 0.1790890 -1.3572858 1.2675452 . 1.696324104 0.875489885
## GENE_5 -0.9601161 0.8556173 -0.1378519 . -0.327492960 -0.730440302
## ... . . . . . .
## GENE_96 0.79716889 0.63706221 0.09266665 . -0.02979687 0.60814939
## GENE_97 -0.66800965 0.16354165 -1.04468339 . -1.00955258 -0.89323768
## GENE_98 -1.46935762 -0.24167793 -0.62951057 . -0.10774194 -0.56974857
## GENE_99 1.28597562 -0.05039254 -1.38589267 . -0.50287017 0.97855660
## GENE_100 0.44691326 1.34608195 1.06211163 . 0.12302216 1.83755694
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## 0.2305772 0.8084421 0.8119825 0.1790890 -0.9601161 1.1659175
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 0.23057719 -0.32307830 -1.45620684 -1.21560128 -1.24670991
## GENE_2 0.80844207 0.49819605 -0.72069210 -0.33714607 -0.12508937
## GENE_3 0.81198251 2.47134954 0.22171848 -1.11538147 -0.04910972
## GENE_4 0.17908902 -1.35728581 1.26754521 -0.81417007 -0.50068866
## GENE_5 -0.96011612 0.85561726 -0.13785192 0.42028662 -1.11837891
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 0.4611544 -0.6461566 -2.9124137 . 0.79754966 -0.54107283
## GENE_2 1.6168841 0.9963921 -1.4413842 . 0.23302736 1.14032371
## GENE_3 1.6239650 4.9426991 0.4434370 . -3.55096327 0.01199563
## GENE_4 0.3581780 -2.7145716 2.5350904 . 3.39264821 1.75097977
## GENE_5 -1.9202322 1.7112345 -0.2757038 . -0.65498592 -1.46088060
## ... . . . . . .
## GENE_96 1.5943378 1.2741244 0.1853333 . -0.05959374 1.21629877
## GENE_97 -1.3360193 0.3270833 -2.0893668 . -2.01910516 -1.78647535
## GENE_98 -2.9387152 -0.4833559 -1.2590211 . -0.21548388 -1.13949714
## GENE_99 2.5719512 -0.1007851 -2.7717853 . -1.00574034 1.95711319
## GENE_100 0.8938265 2.6921639 2.1242233 . 0.24604432 3.67511389
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7
## 2.827492 -30.327829 14.945147 4.066500 1.042208 -0.977135 15.470945
## SAMP_8 SAMP_9 SAMP_10
## -3.959637 3.611949 19.221367
out %*% runif(ncol(out))
## [,1]
## GENE_1 -2.46030782
## GENE_2 1.92820710
## GENE_3 0.66492179
## GENE_4 2.37950405
## GENE_5 0.23948862
## GENE_6 0.98402176
## GENE_7 -0.52949578
## GENE_8 0.81802732
## GENE_9 -0.86836563
## GENE_10 1.66488290
## GENE_11 0.27953573
## GENE_12 0.49882895
## GENE_13 0.72806360
## GENE_14 -0.72261152
## GENE_15 0.38589601
## GENE_16 -2.50724278
## GENE_17 1.96401215
## GENE_18 -2.00828978
## GENE_19 -0.44420174
## GENE_20 0.43079073
## GENE_21 1.06399999
## GENE_22 -1.08287635
## GENE_23 2.25236892
## GENE_24 -1.37476169
## GENE_25 0.08083789
## GENE_26 0.86091646
## GENE_27 1.36369911
## GENE_28 2.17615270
## GENE_29 0.77229859
## GENE_30 -1.70910252
## GENE_31 1.24282506
## GENE_32 -2.40380117
## GENE_33 0.30076618
## GENE_34 0.25885472
## GENE_35 -2.92047609
## GENE_36 -1.69944288
## GENE_37 -0.14251291
## GENE_38 1.76489078
## GENE_39 -4.42569503
## GENE_40 0.23458697
## GENE_41 2.40732620
## GENE_42 -1.45731509
## GENE_43 0.59848604
## GENE_44 -1.01151545
## GENE_45 -3.48448580
## GENE_46 -1.01141854
## GENE_47 -0.90401848
## GENE_48 0.90688074
## GENE_49 2.01496783
## GENE_50 -0.12932669
## GENE_51 -1.14489713
## GENE_52 -1.37605562
## GENE_53 -2.02403439
## GENE_54 -1.23775712
## GENE_55 1.41413683
## GENE_56 -2.12468070
## GENE_57 -0.73966638
## GENE_58 -0.16872987
## GENE_59 -2.44042764
## GENE_60 1.07979621
## GENE_61 0.92309669
## GENE_62 1.70412700
## GENE_63 0.99062369
## GENE_64 1.61685201
## GENE_65 -0.74414873
## GENE_66 -2.72847576
## GENE_67 2.28361844
## GENE_68 -2.22950314
## GENE_69 -0.55265602
## GENE_70 1.20749884
## GENE_71 -1.02680235
## GENE_72 0.92120187
## GENE_73 1.89971771
## GENE_74 1.22813603
## GENE_75 3.27475113
## GENE_76 0.63402037
## GENE_77 0.98990283
## GENE_78 0.14255481
## GENE_79 -0.93916295
## GENE_80 1.83287968
## GENE_81 -1.13525201
## GENE_82 2.14734969
## GENE_83 -0.69788512
## GENE_84 -1.85219284
## GENE_85 1.56156473
## GENE_86 3.49373237
## GENE_87 1.32590524
## GENE_88 2.87830481
## GENE_89 4.40250303
## GENE_90 -0.24696726
## GENE_91 1.09981219
## GENE_92 -1.34203188
## GENE_93 3.98218371
## GENE_94 1.85267288
## GENE_95 2.44192315
## GENE_96 2.32225552
## GENE_97 -0.06463002
## GENE_98 0.61244683
## GENE_99 -1.22427524
## GENE_100 -1.00636055
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.16909714 1.60976720 1.23165243 . 0.9578153 0.3997649
## [2,] 1.22076494 -0.02715891 -2.11449443 . -1.2092884 0.1125839
## [3,] -1.63569920 -0.92730449 -1.19010175 . 0.5992663 -0.2475161
## [4,] 0.41514927 -1.12416800 -0.66686854 . 1.1857848 0.5519471
## [5,] 2.25317845 -1.00556479 0.60580664 . -0.2483425 1.9817745
## ... . . . . . .
## [96,] -1.7881726 -0.6132229 1.4023332 . 0.72625562 0.74538349
## [97,] 1.6341367 1.4022396 0.5536950 . 0.42027006 -1.12042072
## [98,] 1.3318883 -0.3130649 0.6418915 . 0.17807094 -0.28411775
## [99,] 1.2509355 -1.0386565 0.2924761 . -1.31626999 -0.76314737
## [100,] -0.1572923 -1.8815010 1.1490373 . 0.02340041 -0.79015986
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.16909714 1.60976720 1.23165243 . 0.9578153 0.3997649
## [2,] 1.22076494 -0.02715891 -2.11449443 . -1.2092884 0.1125839
## [3,] -1.63569920 -0.92730449 -1.19010175 . 0.5992663 -0.2475161
## [4,] 0.41514927 -1.12416800 -0.66686854 . 1.1857848 0.5519471
## [5,] 2.25317845 -1.00556479 0.60580664 . -0.2483425 1.9817745
## ... . . . . . .
## [96,] -1.7881726 -0.6132229 1.4023332 . 0.72625562 0.74538349
## [97,] 1.6341367 1.4022396 0.5536950 . 0.42027006 -1.12042072
## [98,] 1.3318883 -0.3130649 0.6418915 . 0.17807094 -0.28411775
## [99,] 1.2509355 -1.0386565 0.2924761 . -1.31626999 -0.76314737
## [100,] -0.1572923 -1.8815010 1.1490373 . 0.02340041 -0.79015986
sessionInfo()
## R version 4.4.0 beta (2024-04-14 r86421)
## Platform: x86_64-apple-darwin20
## Running under: macOS Monterey 12.7.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.17 TileDBArray_1.14.0 DelayedArray_0.30.0
## [4] SparseArray_1.4.0 S4Arrays_1.4.0 abind_1.4-5
## [7] IRanges_2.38.0 S4Vectors_0.42.0 MatrixGenerics_1.16.0
## [10] matrixStats_1.3.0 BiocGenerics_0.50.0 Matrix_1.7-0
## [13] BiocStyle_2.32.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.0.5 jsonlite_1.8.8 compiler_4.4.0
## [4] BiocManager_1.30.22 crayon_1.5.2 Rcpp_1.0.12
## [7] nanoarrow_0.4.0.1 jquerylib_0.1.4 yaml_2.3.8
## [10] fastmap_1.1.1 lattice_0.22-6 R6_2.5.1
## [13] RcppCCTZ_0.2.12 XVector_0.44.0 tiledb_0.26.0
## [16] knitr_1.46 bookdown_0.39 bslib_0.7.0
## [19] rlang_1.1.3 cachem_1.0.8 xfun_0.43
## [22] sass_0.4.9 bit64_4.0.5 cli_3.6.2
## [25] zlibbioc_1.50.0 spdl_0.0.5 digest_0.6.35
## [28] grid_4.4.0 lifecycle_1.0.4 data.table_1.15.4
## [31] evaluate_0.23 nanotime_0.3.7 zoo_1.8-12
## [34] rmarkdown_2.26 tools_4.4.0 htmltools_0.5.8.1