1 Overview

ExperimentHubData provides tools to add or modify resources in Bioconductor’s ExperimentHub. This ‘hub’ houses curated data from courses, publications or experiments. The resources are generally not files of raw data (as can be the case in AnnotationHub) but instead are R / Bioconductor objects such as GRanges, SummarizedExperiment, data.frame etc. Each resource has associated metadata that can be searched through the ExperimentHub client interface.

2 New resources

Resources are contributed to ExperimentHub in the form of a package. The package contains the resource metadata, man pages, vignette and any supporting R functions the author wants to provide. This is a similar design to the existing Bioconductor experimental data packages except the data are stored in AWS S3 buckets instead of the data/ directory of the package.

Below are the steps required for adding new resources.

2.1 Notify Bioconductor team member

The man page and vignette examples in the software package will not work until the data are available in ExperimentHub. Adding the data to AWS S3 and the metadata to the production database involves assistance from a Bioconductor team member. If you are interested in submitting a package, please send an email to so a team member can work with you through the process.

2.2 Building the software package

When a resource is downloaded from ExperimentHub the associated software package is loaded in the workspace making the man pages and vignettes readily available. Because documentation plays an important role in understanding these curated resources please take the time to develop clear man pages and a detailed vignette. These documents provide essential background to the user and guide appropriate use the of resources.

Below is an outline of package organization. The files listed are required unless otherwise stated.

2.3 Data objects

Data are not formally part of the software package and are stored separately in AWS S3 buckets. The author should make the data available via dropbox, ftp or another mutually accessible application and it will be uploaded to S3 by a member of the Bioconductor team.

Data files should be created with save() and have the .rda extension.

2.4 Metadata

When you are satisfied with the representation of your resources in make-metadata.R (which produces metadata.csv) the Bioconductor team member will add the metadata to the production database.

2.5 Package review

Once the data are in AWS S3 and the metadata have been added to the production database the man pages and vignette can be finalized. When the package passes R CMD build and check it can be submitted to the package tracker for review.

3 Add additional resources

Multiple versions of the data can be added to the same package as they become available. Be sure the title is descriptive and reflects the distinguishing information such as genome build. Adding new resources to an existing package requires the following steps:

4 Bug fixes

A bug fix may involve a change to the metadata, data resource or both.

4.1 Update the resource

4.2 Update the metadata

5 Remove resources

When a resource is removed from ExperimentHub the ‘status’ field in the metadata is modified to explain why they are no longer available. Once this status is changed the ExperimentHub() constructor will not list the resource among the available ids. An attempt to extract the resource with ‘[[’ and the EH id will return an error along with the status message.

To remove a resource from ExperimentHub contact

6 ExperimentHub_docker

The ExperimentHub_docker offers an isolated test environment for inserting / extracting metadata records in the ExperimentHub database. The README in the package explains how to set up the Docker and inserting records is done with ExperimentHub::addResources().

In general this level of testing should not be necessary when submitting a package with new resources. The best way to validate record metadata is to read inst/extdata/metadata.csv with ExperimentHubData::readMetadataFromCsv(). If that is successful the metadata are ready to go.