HDCytoData 1.20.0
We welcome contributions or suggestions for new datasets to include in the HDCytoData
package. Suitable datasets are any high-dimensional cytometry datasets that either: (i) are useful for benchmarking purposes, e.g. containing some sort of ground truth or known signal (e.g. a known ground truth in simulated data, or a known biological result in experimental data), and/or (ii) are useful for other activities such as teaching, examples, and tutorials.
Below, we summarize the key steps required to contribute a new dataset to the HDCytoData
package. For more details on how to use Bioconductor’s ExperimentHub
resource and creating ExperimentHub
packages, see the ExperimentHub
vignettes available from Bioconductor.
Open a GitHub issue (or pull request, if you already have all files available) at the HDCytoData GitHub page to contact the maintainers and discuss the suitability of the new dataset.
Using GitHub (instead of email) ensures that there is a public record of the contribution. If you are unfamiliar with GitHub, please contact the maintainers by email for assistance.
If the contribution is approved, the following are required for each dataset:
Formatted SummarizedExperiment
and flowSet
objects containing table(s) of expression values and all row, column, and other metadata required for users to fully understand the dataset (e.g. sample IDs, group IDs, patient IDs, cluster labels, spike-in labels, channel names, protein marker names, protein marker classes, etc.) Note that expression values should be be raw values (not transformed).
The objects should be saved as .rda
files with filenames dataset_name_SE.rda
and dataset_name_flowSet.rda
(where dataset_name
is the name of the new dataset; e.g. Levine_32dim_SE.rda
and Levine_32dim_flowSet.rda
).
For an example of the object structures, load one of the existing datasets and inspect the various elements of the SummarizedExperiment
and flowSet
objects:
library(HDCytoData)
## Warning: replacing previous import 'utils::findMatches' by
## 'S4Vectors::findMatches' when loading 'AnnotationDbi'
# example: SummarizedExperiment
d_SE <- Levine_32dim_SE()
d_SE
length(assays(d_SE))
dim(d_SE)
head(assay(d_SE))
rowData(d_SE)
colData(d_SE)
metadata(d_SE)
# example: flowSet
d_flowSet <- Levine_32dim_flowSet()
## Warning in updateObjectFromSlots(object, ..., verbose = verbose): dropping
## slot(s) 'colnames' from object = 'flowSet'
d_flowSet
length(d_flowSet)
fsApply(d_flowSet, dim)
head(exprs(d_flowSet[[1]]))
parameters(d_flowSet[[1]])@data
colnames(d_flowSet)
description(d_flowSet[[1]])
## Warning: '.local' is deprecated.
## Use 'keyword' instead.
## See help("Deprecated")
A reproducible R script showing how the formatted SummarizedExperiment
and flowSet
objects were generated from the original raw data files (.fcs
files). The script should be named make-data-dataset-name.R
(e.g. make-data-Levine-32dim.R
), and will be saved in the inst/scripts
directory in the source code of the HDCytoData
package for reproducibility purposes.
For an example, see the script for one of the existing datasets, e.g. inst/scripts/make-data-Levine-32dim.R (see GitHub for more examples).
Comprehensive documentation describing the dataset, what it can be used for, and the object structure. This should be formatted as an .Rd
file, to be saved in the man
directory in the source code of the HDCytoData
package. (Note that Roxygen
cannot be used for ExperimentHub
packages, so the .Rd
file needs to be written manually.)
For an example, see the help file for one of the existing datasets, e.g. ?Levine_32dim
, or source code at man/Levine_32dim_SE.Rd (see GitHub for more examples).
Metadata to describe the dataset in the ExperimentHub
database, which will be saved in the file inst/scripts/make-metadata.R. For examples, see the metadata for the existing datasets in this file.
For more details on how to structure the files for a dataset stored on ExperimentHub
, see the vignette titled “Creating An ExperimentHub Package” on the Bioconductor page.
Once all the parts described above have been generated, create a fork of the HDCytoData
GitHub repository, add the files (except the .rda
files), and send a pull request. Please make sure that the files are in the correct directories, and that the package passes all the usual Bioconductor and CRAN checks.
.rda
filesIn consultation with the HDCytoData
maintainers, you will then need to contact the Bioconductor ExperimentHub
maintainers to upload the .rda
files. The ExperimentHub
maintainers will then manually add them to the ExperimentHub
database, as described in the “Creating An ExperimentHub Package” vignette on the Bioconductor page. Note that any future updates (e.g. bug fixes) to the .rda
files will need to be manually added in the same way, so please check the objects carefully.
Names of contributors will be added to the contributor
field in the DESCRIPTION file of the HDCytoData
package in order to recognize the contribution.
This project is released in accordance with the Contributor Covenant Code of Conduct. By contributing to this project, you agree to abide by its terms. Any concerns regarding the code of conduct may be reported by contacting the package maintainers by email.