HubPub 1.12.0
HubPub
provides users with functionality to help with the Bioconductor Hub
structures. The package provides the ability to create a skeleton of a Hub
style package that the user can then populate with the necessary information.
There are also functions to help add resources to the Hub pacakge metadata
files as well as publish data to the Bioconductor S3 bucket.
Install the most recent version from Bioconductor:
if(!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("HubPub")
Then load HubPub
:
library(HubPub)
The create_pkg()
function creates the skeleton of a package that follows the
guidelines for a Bioconductor Hub type package. More information about what
are the requirements and content for a Hub style package the developer can look
at the “Creating A Hub Package” vignette from this package.
create_pkg()
requires a path to where the packages is to be created and the type
of package that should be created (“AnnotationHub” or “ExperimentHub”). There is
also a variable use_git
that indicates if the package should be set up with
git (default is TRUE
).
NOTE: This function is intended for a developer that has not created the package yet. If the package has already been created, then this function will not benefit the developer. There are a couple other functions in this package that deal with resources that might be helpful, more on these later in the vignette.
fl <- tempdir()
create_pkg(file.path(fl, "examplePkg"), "ExperimentHub")
#> ✔ Creating '/tmp/RtmpKLaJiP/examplePkg/'
#> ✔ Setting active project to '/tmp/RtmpKLaJiP/examplePkg'
#> ✔ Creating 'R/'
#> ✔ Writing 'DESCRIPTION'
#> Package: examplePkg
#> Title: What the Package Does (One Line, Title Case)
#> Version: 0.99.0
#> Date: 2024-05-01
#> Authors@R (parsed):
#> * First Last <first.last@example.com> [aut, cre] (YOUR-ORCID-ID)
#> Description: What the package does (one paragraph).
#> License: Artistic-2.0
#> BugReports: https://support.bioconductor.org/t/examplePkg
#> Imports:
#> ExperimentHub
#> Suggests:
#> ExperimentHubData
#> biocViews: ExperimentHub
#> Encoding: UTF-8
#> Roxygen: list(markdown = TRUE)
#> RoxygenNote: 7.3.1
#> ✔ Writing 'NAMESPACE'
#> ✔ Setting active project to '<no active project>'
#> ✔ Setting active project to '/tmp/RtmpKLaJiP/examplePkg'
#> ✔ Initialising Git repo
#> ✔ Adding '.Rproj.user', '.Rhistory', '.Rdata', '.httr-oauth', '.DS_Store', '.quarto' to '.gitignore'
#> ✔ Writing 'R/examplePkg-package.R'
#> ✔ Writing 'NEWS.md'
#> ✔ Creating 'man/'
#> ✔ Creating 'inst/scripts/'
#> ✔ Writing 'inst/scripts/make-data.R'
#> ✔ Writing 'inst/scripts/make-metadata.R'
#> ✔ Writing 'R/zzz.R'
#> ✔ Creating 'inst/extdata/'
#> ✔ Adding 'testthat' to Suggests field in DESCRIPTION
#> ✔ Adding '3' to Config/testthat/edition
#> ✔ Creating 'tests/testthat/'
#> ✔ Writing 'tests/testthat.R'
#> • Call `use_test()` to initialize a basic test file and open it for editing.
#> ✔ Writing 'tests/testthat/test_metadata.R'
#> [1] "/tmp/RtmpKLaJiP/examplePkg"
Once the package is created the developer can go through and make any changes to the package. For example, the DESCRIPTON file contains very basic requirements but the developer should go back and fill in the ‘Title:’ and ‘Description:’ fields.
Another useful function in HubPub
is add_resource()
. This function can be
useful for developers who are creating a new Hub related package or for
developers who want to add a new resource to an existing Hub package. The
purpose of this function is to add a hub resource to the package metadata.csv
file. The function requires the name of the package (or the path to the newly
created package) and a named list with the data to be added to the resource. To
get the elements and content for this list look at ?hub_metadata
. There is
also information in the “Creating A Hub Package” vignette from this package.
metadata <- hub_metadata(
Title = "ENCODE",
Description = "a test entry",
BiocVersion = "4.1",
Genome = NA_character_,
SourceType = "JSON",
SourceUrl = "http://www.encodeproject.org",
SourceVersion = "x.y.z",
Species = NA_character_,
TaxonomyId = as.integer(9606),
Coordinate_1_based = NA,
DataProvider = "ENCODE Project",
Maintainer = "tst person <tst@email.com>",
RDataClass = "Rda",
DispatchClass = "Rda",
Location_Prefix = "s3://experimenthub/",
RDataPath = "ENCODExplorerData/encode_df_lite.rda",
Tags = "ENCODE:Homo sapiens"
)
add_resource(file.path(fl, "examplePkg"), metadata)
#> Warning: replacing previous import 'utils::findMatches' by
#> 'S4Vectors::findMatches' when loading 'ExperimentHubData'
#> [1] "/tmp/RtmpKLaJiP/examplePkg/inst/extdata/metadata.csv"
Then if you want to see what the metadata file looks like you can read in the csv file like the following.
resource <- file.path(fl, "examplePkg", "inst", "extdata", "metadata.csv")
tst <- read.csv(resource)
tst
#> Title Description BiocVersion Genome SourceType
#> 1 ENCODE a test entry 4.1 NA JSON
#> SourceUrl SourceVersion Species TaxonomyId
#> 1 http://www.encodeproject.org x.y.z NA 9606
#> Coordinate_1_based DataProvider Maintainer RDataClass
#> 1 NA ENCODE Project tst person <tst@email.com> Rda
#> DispatchClass Location_Prefix RDataPath
#> 1 Rda s3://experimenthub/ ENCODExplorerData/encode_df_lite.rda
#> Tags
#> 1 ENCODE:Homo sapiens
The final function in HubPub
helps the developer with publishing data resources
to an Bioconductor AWS S3. The function utilizes functions for the aws.s3
package to place files or directories on S3. The developer should have already
contacted the Bioconductor hubs maintainers to get the necessary credentials to
access the bucket. Once the credentials are received the developer should
declare them in the system environment before running this function. The
function requires a path to the file or name of the directory to be added to the
bucket and a name for how the object should be named on the bucket. If adding a
directory be sure there are no nested directories and only files.
The below code chunk demonstrates the use of the function using a dummy dataset. It will only work if the necessary global environments have been declared with the hub credentials.
## For publishing directories with multiple files
fl <- tempdir()
utils::write.csv(mtcars, file = file.path(fl, "mtcars1.csv"))
utils::write.csv(mtcars, file = file.path(fl, "mtcars2.csv"))
publish_resource(fl, "test_dir")
#> Warning in publish_resource(fl, "test_dir"): Not all system environment
#> variables are set, do so and rerun function.
#> copy '/tmp/RtmpKLaJiP/BiocStyle' to 's3://annotation-contributor/test_dir/BiocStyle'
#> copy '/tmp/RtmpKLaJiP/examplePkg' to 's3://annotation-contributor/test_dir/examplePkg'
#> copy '/tmp/RtmpKLaJiP/mtcars1.csv' to 's3://annotation-contributor/test_dir/mtcars1.csv'
#> copy '/tmp/RtmpKLaJiP/mtcars2.csv' to 's3://annotation-contributor/test_dir/mtcars2.csv'
#> copy '/tmp/RtmpKLaJiP/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.19%2Fbioc%2Fsrc%2Fcontrib.rds' to 's3://annotation-contributor/test_dir/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.19%2Fbioc%2Fsrc%2Fcontrib.rds'
#> copy '/tmp/RtmpKLaJiP/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.19%2Fbooks%2Fsrc%2Fcontrib.rds' to 's3://annotation-contributor/test_dir/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.19%2Fbooks%2Fsrc%2Fcontrib.rds'
#> copy '/tmp/RtmpKLaJiP/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.19%2Fdata%2Fannotation%2Fsrc%2Fcontrib.rds' to 's3://annotation-contributor/test_dir/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.19%2Fdata%2Fannotation%2Fsrc%2Fcontrib.rds'
#> copy '/tmp/RtmpKLaJiP/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.19%2Fdata%2Fexperiment%2Fsrc%2Fcontrib.rds' to 's3://annotation-contributor/test_dir/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.19%2Fdata%2Fexperiment%2Fsrc%2Fcontrib.rds'
#> copy '/tmp/RtmpKLaJiP/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.19%2Fworkflows%2Fsrc%2Fcontrib.rds' to 's3://annotation-contributor/test_dir/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.19%2Fworkflows%2Fsrc%2Fcontrib.rds'
#> copy '/tmp/RtmpKLaJiP/repos_https%3A%2F%2Fcloud.r-project.org%2Fsrc%2Fcontrib.rds' to 's3://annotation-contributor/test_dir/repos_https%3A%2F%2Fcloud.r-project.org%2Fsrc%2Fcontrib.rds'
#> $`/tmp/RtmpKLaJiP/BiocStyle`
#> NULL
#>
#> $`/tmp/RtmpKLaJiP/examplePkg`
#> NULL
#>
#> $`/tmp/RtmpKLaJiP/mtcars1.csv`
#> NULL
#>
#> $`/tmp/RtmpKLaJiP/mtcars2.csv`
#> NULL
#>
#> $`/tmp/RtmpKLaJiP/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.19%2Fbioc%2Fsrc%2Fcontrib.rds`
#> NULL
#>
#> $`/tmp/RtmpKLaJiP/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.19%2Fbooks%2Fsrc%2Fcontrib.rds`
#> NULL
#>
#> $`/tmp/RtmpKLaJiP/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.19%2Fdata%2Fannotation%2Fsrc%2Fcontrib.rds`
#> NULL
#>
#> $`/tmp/RtmpKLaJiP/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.19%2Fdata%2Fexperiment%2Fsrc%2Fcontrib.rds`
#> NULL
#>
#> $`/tmp/RtmpKLaJiP/repos_https%3A%2F%2Fbioconductor.org%2Fpackages%2F3.19%2Fworkflows%2Fsrc%2Fcontrib.rds`
#> NULL
#>
#> $`/tmp/RtmpKLaJiP/repos_https%3A%2F%2Fcloud.r-project.org%2Fsrc%2Fcontrib.rds`
#> NULL
## For publishing a single file
utils::write.csv(mtcars, file = file.path(fl, "mtcars3.csv"))
publish_resource(file.path(fl, "mtcars3.csv"), "test_dir")
#> Warning in publish_resource(file.path(fl, "mtcars3.csv"), "test_dir"): Not all
#> system environment variables are set, do so and rerun function.
#> copy '/tmp/RtmpKLaJiP/mtcars3.csv' to 's3://annotation-contributor/test_dir/mtcars3.csv'
#> $`/tmp/RtmpKLaJiP/mtcars3.csv`
#> NULL
sessionInfo()
#> R version 4.4.0 beta (2024-04-15 r86425)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] futile.logger_1.4.3 HubPub_1.12.0 BiocStyle_2.32.0
#>
#> loaded via a namespace (and not attached):
#> [1] sys_3.4.2 rstudioapi_0.16.0
#> [3] jsonlite_1.8.8 magrittr_2.0.3
#> [5] GenomicFeatures_1.56.0 rmarkdown_2.26
#> [7] fs_1.6.4 BiocIO_1.14.0
#> [9] zlibbioc_1.50.0 vctrs_0.6.5
#> [11] memoise_2.0.1 Rsamtools_2.20.0
#> [13] RCurl_1.98-1.14 askpass_1.2.0
#> [15] base64enc_0.1-3 BiocBaseUtils_1.6.0
#> [17] htmltools_0.5.8.1 S4Arrays_1.4.0
#> [19] usethis_2.2.3 progress_1.2.3
#> [21] lambda.r_1.2.4 AnnotationHub_3.12.0
#> [23] curl_5.2.1 SparseArray_1.4.0
#> [25] sass_0.4.9 bslib_0.7.0
#> [27] desc_1.4.3 testthat_3.2.1.1
#> [29] httr2_1.0.1 futile.options_1.0.1
#> [31] cachem_1.0.8 available_1.1.0
#> [33] GenomicAlignments_1.40.0 whisker_0.4.1
#> [35] lifecycle_1.0.4 pkgconfig_2.0.3
#> [37] Matrix_1.7-0 R6_2.5.1
#> [39] fastmap_1.1.1 BiocCheck_1.40.0
#> [41] GenomeInfoDbData_1.2.12 MatrixGenerics_1.16.0
#> [43] digest_0.6.35 AnnotationDbi_1.66.0
#> [45] S4Vectors_0.42.0 OrganismDbi_1.46.0
#> [47] rprojroot_2.0.4 ExperimentHub_2.12.0
#> [49] aws.signature_0.6.0 GenomicRanges_1.56.0
#> [51] RSQLite_2.3.6 filelock_1.0.3
#> [53] fansi_1.0.6 httr_1.4.7
#> [55] abind_1.4-5 compiler_4.4.0
#> [57] bit64_4.0.5 withr_3.0.0
#> [59] biocViews_1.72.0 BiocParallel_1.38.0
#> [61] DBI_1.2.2 R.utils_2.12.3
#> [63] biomaRt_2.60.0 openssl_2.1.2
#> [65] rappdirs_0.3.3 DelayedArray_0.30.0
#> [67] rjson_0.2.21 tools_4.4.0
#> [69] R.oo_1.26.0 glue_1.7.0
#> [71] restfulr_0.0.15 R.cache_0.16.0
#> [73] grid_4.4.0 stringdist_0.9.12
#> [75] generics_0.1.3 R.methodsS3_1.8.2
#> [77] hms_1.1.3 xml2_1.3.6
#> [79] utf8_1.2.4 XVector_0.44.0
#> [81] BiocGenerics_0.50.0 BiocVersion_3.19.1
#> [83] pillar_1.9.0 stringr_1.5.1
#> [85] dplyr_1.1.4 BiocFileCache_2.12.0
#> [87] lattice_0.22-6 AnnotationHubData_1.34.0
#> [89] rtracklayer_1.64.0 bit_4.0.5
#> [91] tidyselect_1.2.1 RBGL_1.80.0
#> [93] Biostrings_2.72.0 knitr_1.46
#> [95] biocthis_1.14.0 bookdown_0.39
#> [97] IRanges_2.38.0 SummarizedExperiment_1.34.0
#> [99] stats4_4.4.0 xfun_0.43
#> [101] Biobase_2.64.0 credentials_2.0.1
#> [103] brio_1.1.5 matrixStats_1.3.0
#> [105] stringi_1.8.3 UCSC.utils_1.0.0
#> [107] yaml_2.3.8 evaluate_0.23
#> [109] codetools_0.2-20 tibble_3.2.1
#> [111] BiocManager_1.30.22 graph_1.82.0
#> [113] cli_3.6.2 jquerylib_0.1.4
#> [115] styler_1.10.3 roxygen2_7.3.1
#> [117] GenomeInfoDb_1.40.0 gert_2.0.1
#> [119] dbplyr_2.5.0 png_0.1-8
#> [121] XML_3.99-0.16.1 RUnit_0.4.33
#> [123] parallel_4.4.0 blob_1.2.4
#> [125] prettyunits_1.2.0 aws.s3_0.3.21
#> [127] AnnotationForge_1.46.0 bitops_1.0-7
#> [129] txdbmaker_1.0.0 ExperimentHubData_1.30.0
#> [131] purrr_1.0.2 crayon_1.5.2
#> [133] rlang_1.1.3 formatR_1.14
#> [135] KEGGREST_1.44.0