We welcome contributions or suggestions for new datasets to include in the
HDCytoData package. Suitable datasets are any high-dimensional cytometry datasets that either: (i) are useful for benchmarking purposes, e.g. containing some sort of ground truth or known signal (e.g. a known ground truth in simulated data, or a known biological result in experimental data), and/or (ii) are useful for other activities such as teaching, examples, and tutorials.
Below, we summarize the key steps required to contribute a new dataset to the
HDCytoData package. For more details on how to use Bioconductor’s
ExperimentHub resource and creating
ExperimentHub packages, see the
ExperimentHub vignettes available from Bioconductor.
Open a GitHub issue (or pull request, if you already have all files available) at the HDCytoData GitHub page to contact the maintainers and discuss the suitability of the new dataset.
Using GitHub (instead of email) ensures that there is a public record of the contribution. If you are unfamiliar with GitHub, please contact the maintainers by email for assistance.
If the contribution is approved, the following are required for each dataset:
flowSet objects containing table(s) of expression values and all row, column, and other metadata required for users to fully understand the dataset (e.g. sample IDs, group IDs, patient IDs, cluster labels, spike-in labels, channel names, protein marker names, protein marker classes, etc.) Note that expression values should be be raw values (not transformed).
The objects should be saved as
.rda files with filenames
dataset_name is the name of the new dataset; e.g.
For an example of the object structures, load one of the existing datasets and inspect the various elements of the
library(HDCytoData) # example: SummarizedExperiment d_SE <- Levine_32dim_SE() d_SE length(assays(d_SE)) dim(d_SE) head(assay(d_SE)) rowData(d_SE) colData(d_SE) metadata(d_SE) # example: flowSet d_flowSet <- Levine_32dim_flowSet() d_flowSet length(d_flowSet) fsApply(d_flowSet, dim) head(exprs(d_flowSet[])) parameters(d_flowSet[])@data colnames(d_flowSet) description(d_flowSet[])
A reproducible R script showing how the formatted
flowSet objects were generated from the original raw data files (
.fcs files). The script should be named
make-data-Levine-32dim.R), and will be saved in the
inst/scripts directory in the source code of the
HDCytoData package for reproducibility purposes.
For an example, see the script for one of the existing datasets, e.g. inst/scripts/make-data-Levine-32dim.R (see GitHub for more examples).
Comprehensive documentation describing the dataset, what it can be used for, and the object structure. This should be formatted as an
.Rd file, to be saved in the
man directory in the source code of the
HDCytoData package. (Note that
Roxygen cannot be used for
ExperimentHub packages, so the
.Rd file needs to be written manually.)
For an example, see the help file for one of the existing datasets, e.g.
?Levine_32dim, or source code at man/Levine_32dim_SE.Rd (see GitHub for more examples).
Metadata to describe the dataset in the
ExperimentHub database, which will be saved in the file inst/scripts/make-metadata.R. For examples, see the metadata for the existing datasets in this file.
For more details on how to structure the files for a dataset stored on
ExperimentHub, see the vignette titled “Creating An ExperimentHub Package” on the Bioconductor page.
Once all the parts described above have been generated, create a fork of the
HDCytoData GitHub repository, add the files (except the
.rda files), and send a pull request. Please make sure that the files are in the correct directories, and that the package passes all the usual Bioconductor and CRAN checks.
In consultation with the
HDCytoData maintainers, you will then need to contact the Bioconductor
ExperimentHub maintainers to upload the
.rda files. The
ExperimentHub maintainers will then manually add them to the
ExperimentHub database, as described in the “Creating An ExperimentHub Package” vignette on the Bioconductor page. Note that any future updates (e.g. bug fixes) to the
.rda files will need to be manually added in the same way, so please check the objects carefully.
Names of contributors will be added to the
contributor field in the DESCRIPTION file of the
HDCytoData package in order to recognize the contribution.
This project is released in accordance with the Contributor Covenant Code of Conduct. By contributing to this project, you agree to abide by its terms. Any concerns regarding the code of conduct may be reported by contacting the package maintainers by email.