Contents

1 Using the DelayedArray infrastructure

A remote dataset is accessed by giving the URL of the server, the path to the HDF5 file and the name of the dataset in the file.

If the server is an h5serv instance, the filepath will be ignored and the dataset name looked for in all files. This is legacy behavior, reflecting the fact that on the h5s.channingremotedata.org server each dataset had its own file.

If the server is an hsds instance, the filepath should have the path and name of the HDF5 file.

2 Interface to HSDS (HDF Object Store)

del10x = H5S_Array(URL_hsds(), '/home/reshg/tenx_full.h5', 'newassay001')
del10x
## <27998 x 1306127> H5S_Matrix object of type "double":
##                [,1]       [,2]       [,3] ... [,1306126] [,1306127]
##     [1,]          0          0          0   .          0          0
##     [2,]          0          0          0   .          0          0
##     [3,]          0          0          0   .          0          0
##     [4,]          0          0          0   .          0          0
##     [5,]          0          0          0   .          0          0
##      ...          .          .          .   .          .          .
## [27994,]          0          0          0   .          0          0
## [27995,]          1          0          0   .          0          0
## [27996,]          0          0          0   .          0          0
## [27997,]          0          0          0   .          0          0
## [27998,]          0          0          0   .          0          0

Again we have DelayedArray capabilities.

apply(del10x[,1:4],2,sum)
## [1] 4046 2087 4654 3193

3 Comment: file names and dataset names

Note that in both cases, it is possible that multiple datasets with the same name will be found in the filepath. On the h5serv this can happen when multiple HDF5 files have datasets with the same name. On hsds it can happen if multiple datasets with the same name are in the internal group hierarchy. This condition is expected to be rare in reality.

Example: multiple datasets named dsetX

y = H5S_Array(URL_hsds(), '/home/spollack/testzero.h5', 'dsetX')