alabaster.files 1.2.0
The alabaster.files package implements methods to save common bioinformatics file formats within the alabaster framework.
It does not perform any validation or parsing of the files, it just provides very light-weight wrappers for processing via alabaster.base::stageObject()
.
Check out the alabaster.base package for more details on the motivation and concepts behind alabaster.
We’ll start with an indexed BAM file from the Rsamtools package:
bam.file <- system.file("extdata", "ex1.bam", package="Rsamtools", mustWork=TRUE)
bam.index <- paste0(bam.file, ".bai")
We can wrap this inside a BamFileReference
class:
library(alabaster.files)
library(S4Vectors)
wrapped.bam <- BamFileReference(bam.file, index=bam.index)
Then we can save it to file:
dir <- tempfile()
saveObject(wrapped.bam, dir)
… and load it back at some later time.
readObject(dir)
## BamFileReference object
## path: /tmp/RtmpHdVKNq/file1ecebf2e3155f8/file.bam
## index: /tmp/RtmpHdVKNq/file1ecebf2e3155f8/file.bam.bai
The example above isn’t very exciting, but it demonstrates how these files can be easily added to an alabaster project.
This allows us to incorporate the Wrapper
objects into other Bioconductor data structures, like:
df <- DataFrame(Sample=LETTERS[1:4])
# Adding a column of assorted wrapper files:
df$File <- list(
wrapped.bam,
BigWigFileReference(system.file("tests", "test.bw", package = "rtracklayer")),
BigBedFileReference(system.file("tests", "test.bb", package = "rtracklayer")),
BcfFileReference(system.file("extdata", "ex1.bcf.gz", package = "Rsamtools"))
)
# Saving it all to the staging directory:
dir <- tempfile()
saveObject(df, dir)
# Now reading it back in:
roundtrip <- readObject(dir)
roundtrip$File
## [[1]]
## BamFileReference object
## path: /tmp/RtmpHdVKNq/file1ecebf10ed236f/other_columns/1/other_contents/0/file.bam
## index: /tmp/RtmpHdVKNq/file1ecebf10ed236f/other_columns/1/other_contents/0/file.bam.bai
##
## [[2]]
## BigWigFileReference object
## path: /tmp/RtmpHdVKNq/file1ecebf10ed236f/other_columns/1/other_contents/1/file.bw
##
## [[3]]
## BigBedFileReference object
## path: /tmp/RtmpHdVKNq/file1ecebf10ed236f/other_columns/1/other_contents/2/file.bb
##
## [[4]]
## BcfFileReference object
## path: /tmp/RtmpHdVKNq/file1ecebf10ed236f/other_columns/1/other_contents/3/file.bcf
## index: NULL
Similarly, if the staging directory is uploaded to a remote store, the wrapped files will automatically be included in the upload. This avoids the need for a separate process to handle these files.
alabaster.files will try to perform some cursory validation of the wrapped file to catch errors in user inputs.
The level of validation is format-dependent but should be fast, e.g., BAM file validation is performed by scanning the header.
In all cases, users should not expect an exhaustive check of file validity, as that would take too long and involve more parsing than desired for the scope of alabaster.files.
If stricter validation is required, applications calling alabaster.files should override the saveObject()
methods for the relevant FileReference
classes.
sessionInfo()
## R version 4.4.0 beta (2024-04-15 r86425)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] S4Vectors_0.42.0 BiocGenerics_0.50.0 alabaster.files_1.2.0
## [4] alabaster.base_1.4.0 BiocStyle_2.32.0
##
## loaded via a namespace (and not attached):
## [1] jsonlite_1.8.8 crayon_1.5.2 compiler_4.4.0
## [4] BiocManager_1.30.22 Rcpp_1.0.12 Rsamtools_2.20.0
## [7] GenomicRanges_1.56.0 rhdf5filters_1.16.0 bitops_1.0-7
## [10] Biostrings_2.72.0 parallel_4.4.0 jquerylib_0.1.4
## [13] IRanges_2.38.0 BiocParallel_1.38.0 yaml_2.3.8
## [16] fastmap_1.1.1 R6_2.5.1 XVector_0.44.0
## [19] GenomeInfoDb_1.40.0 knitr_1.46 bookdown_0.39
## [22] GenomeInfoDbData_1.2.12 bslib_0.7.0 rlang_1.1.3
## [25] cachem_1.0.8 xfun_0.43 sass_0.4.9
## [28] cli_3.6.2 Rhdf5lib_1.26.0 zlibbioc_1.50.0
## [31] digest_0.6.35 alabaster.schemas_1.4.0 rhdf5_2.48.0
## [34] lifecycle_1.0.4 evaluate_0.23 codetools_0.2-20
## [37] rmarkdown_2.26 httr_1.4.7 tools_4.4.0
## [40] htmltools_0.5.8.1 UCSC.utils_1.0.0