# HSDSSource

An object of type HSDSSource is a HDFGroup server running on a machine. The constructor requires the endpoint and server type. At present, the only valie value is hsds (for the HDF Scalable Data Service). If the type is not specified, the server will be assumed to be hsds

src.hsds <- HSDSSource('http://hsdshdflab.hdfgroup.org')

The routine listDomains is provided for inspection of the server hierarchy. This is the hierarchy that maps approximately to the directory structure of the server file system. The purpose of this routine is to assist the user in locating HDF5 files.

The user needs to know the root domain of the server. The data set’s maintainer should publish this information along with the server endpoint.

listDomains(src.hsds, '/home/jreadey')
##  [1] "/home/jreadey/4DStem"
## [22] "/home/jreadey/tmp"
listDomains(src.hsds, '/home/jreadey/HDFLabTutorial')
## [1] "/home/jreadey/HDFLabTutorial/03.h5"
## [3] "/home/jreadey/HDFLabTutorial/04a.h5"

# HSDSFile

An object of class HSDSFile represents a HDF5 file. The object is constructed by providing a source and a file domain.

f0 <- HSDSFile(src.hsds, '/home/spollack/testzero.h5')
f1 <- HSDSFile(src.hsds, '/shared/bioconductor/tenx_full.h5')

The function listDatasets lists the datasets in a file.

listDatasets(f0)
## [1] "/grpA/grpAA/dsetAA1" "/grpA/grpAB/dsetX"   "/grpB/grpBA/dsetX"
## [4] "/grpB/grpBB/dsetBB1" "/grpC/dsetCC"
listDatasets(f1)
## [1] "/newassay001"

# HSDSDataset

Construct a HSDSDataset object from a HSDSFile and a dataset path.

d0 <- HSDSDataset(f0, '/grpA/grpAB/dsetX')
d1 <- HSDSDataset(f1, '/newassay001')

## Data Fetch (1)

The low-level data retrieval method is getData. Its argument is a vector of slices of type character. Valid slices are : (all indices), 1:10 (indices 1 through 10 inclusive), :10 (same as 1:10), 5: (from 5 to the maximum value of the index) and 2:14:4 (from 2 to 14 inclusive in increments of 4.)

Note that the slice should be passed in R semantics: 1 signifies the first element, and the last element is included in the slice. (Internally, rhdf5client converts to Python semantics, in which the first index is 0 and the last element is excluded. But here, as everywhere in the package, all Python details should be hidden from the user.)

apply(getData(d1, c('1:4', '1:27998'), transfermode='JSON'), 1, sum)
## [1] 4046 2087 4654 3193
apply(getData(d1, c('1:4', '1:27998'), transfermode='binary'), 1, sum)
## [1] 4046 2087 4654 3193

## Data Fetch (2)

getData is generic. It can also be passed a list of vectors for the index argument, one vector in each dimension. At present, it only works if each of the vectors can be expressed as a single slice. Eventually, this functionality will be expanded to the general multi-dimensional case of multiple slices. In the general case, multiple array blocks will be fetched and bound back together into a single array.

apply(getData(d1, list(1:4, 1:27998), transfermode='JSON'), 1, sum)
## [1] 4046 2087 4654 3193
apply(getData(d1, list(1:4, 1:27998), transfermode='binary'), 1, sum)
## [1] 4046 2087 4654 3193

## Data Fetch (3)

The [ operator is provided for the two most typical cases (one-dimensional and two-dimensional numeric data.)

apply(d1[1:4, 1:27998], 1, sum)
## [1] 4046 2087 4654 3193