Contents

1 Introduction

The Matter package provides flexible data structures for out-of-memory computing on dense and sparse arrays, with several features designed specifically for computing on nonuniform signals such as mass spectra and other spectral data.

Matter 2 has been updated to provide a more robust C++ backend to out-of-memory matter objects, along with a completely new implementation of sparse arrays and new signal processing functions for nonuniform sparse signal data.

Originally designed as a backend for the Cardinal package, The first version of Matter was constantly evolving to handle the ever-increasing demands of larger-than-memory mass spectrometry (MS) imaging experiments. While it was designed to be flexible from a user’s point-of-view to handle a wide array for file structures beyond the niche of MS imaging, its codebase was becoming increasingly difficult to maintain and update.

Matter 2 was re-written from the ground up to simplify some features that were rarely needed in practice and to provide a more robust and future-proof codebase for further improvement.

Specific improvements include:

2 Installation

Matter can be installed via the BiocManager package.

install.packages("BiocManager")
BiocManager::install("matter")

The same function can be used to update Matter and other Bioconductor packages.

Once installed, Matter can be loaded with library():

library(matter)

3 Out-of-memory data structures

Matter provides a number of data structures for out-of-memory computing. These are designed to flexibly support a variety of binary file structures, which can be computed on similarly to native R data structures.

3.1 Atomic data units

The basis of out-of-memory data structures in Matter is a single contiguous chunk of data called an “atom”. The basic idea is: an “atom” is a unit of data that can be pulled into memory in a single atomic read operation.

An “atom” of data typically lives in a local file. It is defined by (1) its source (e.g., a file path), (2) its data type, (3) its offset within the source (in bytes), and (4) its extent (i.e., the number of elements).

A matter object is composed of any number of atoms, from any number of files, that together make up the elements of the data structure.

x <- matter_vec(1:10)
y <- matter_vec(11:20)
z <- cbind(x, y)
atomdata(z)
## <2 length> atoms :: units of data
##                   source type offset extent group
## 1 file2743a3799c06bf.bin  int      0     10     0
## 2 file2743a37064c4e0.bin  int      0     10     1
## (20 elements | 10 per group | 2 groups)

Above, the two columns of the matrix z are composed of two different “atoms” from two different files.

In this way, a matter object may be composed of data from any number of files, from any locations (i.e., byte offsets) within those files. This data can then be represented to the user as an array, matrix, vector, or list.

3.2 Arrays and matrices

Coming soon…

3.2.1 Deferred arithmetic

Coming soon…

3.3 Lists

Coming soon…

4 Sparse data structures

Coming soon…

4.1 Sparse matrices

Coming soon…

4.1.1 Deferred arithmetic

Coming soon…

4.2 Nonuniform signals

Coming soon…

5 Session information

sessionInfo()
## R version 4.2.2 (2022-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] matter_2.0.1        Matrix_1.5-3        BiocParallel_1.32.1
## [4] BiocStyle_2.26.0   
## 
## loaded via a namespace (and not attached):
##  [1] knitr_1.40          magrittr_2.0.3      BiocGenerics_0.44.0
##  [4] lattice_0.20-45     R6_2.5.1            rlang_1.0.6        
##  [7] fastmap_1.1.0       stringr_1.4.1       tools_4.2.2        
## [10] parallel_4.2.2      grid_4.2.2          biglm_0.9-2.1      
## [13] xfun_0.35           irlba_2.3.5.1       DBI_1.1.3          
## [16] cli_3.4.1           jquerylib_0.1.4     ProtGenerics_1.30.0
## [19] htmltools_0.5.3     yaml_2.3.6          digest_0.6.30      
## [22] bookdown_0.30       BiocManager_1.30.19 sass_0.4.2         
## [25] codetools_0.2-18    cachem_1.0.6        evaluate_0.18      
## [28] rmarkdown_2.18      stringi_1.7.8       compiler_4.2.2     
## [31] bslib_0.4.1         jsonlite_1.8.3