The RaggedExperiment package provides a flexible data representation for copy number, mutation and other ragged array schema for genomic location data. It aims to provide a framework for a set of samples that have differing numbers of genomic ranges.
The RaggedExperiment
class derives from a GRangesList
representation and
provides a semblance of a rectangular dataset. The row and column dimensions of
the RaggedExperiment
correspond to the number of ranges in the entire dataset
and the number of samples represented in the data, respectively.
if (!require("BiocManager"))
install.packages("BiocManager")
BiocManager::install("RaggedExperiment")
Loading the package:
library(RaggedExperiment)
library(GenomicRanges)
Your citations are crucial in keeping our software free and open source. To cite our package see the citation (Ramos et al. (2023)) in the Reference section. You may also browse to the publication at the link here.
RaggedExperiment
class overviewA schematic showing the class geometry and supported transformations of the
RaggedExperiment
class is show below. There are three main operations for
transforming the RaggedExperiment
representation:
sparseAssay
compactAssay
qreduceAssay
RaggedExperiment
objectWe start with a couple of GRanges
objects, each representing an individual
sample:
sample1 <- GRanges(
c(A = "chr1:1-10:-", B = "chr1:8-14:+", C = "chr2:15-18:+"),
score = 3:5)
sample2 <- GRanges(
c(D = "chr1:1-10:-", E = "chr2:11-18:+"),
score = 1:2)
Include column data colData
to describe the samples:
colDat <- DataFrame(id = 1:2)
GRanges
objectsragexp <- RaggedExperiment(
sample1 = sample1,
sample2 = sample2,
colData = colDat
)
ragexp
## class: RaggedExperiment
## dim: 5 2
## assays(1): score
## rownames(5): A B C D E
## colnames(2): sample1 sample2
## colData names(1): id
GRangesList
instancegrl <- GRangesList(sample1 = sample1, sample2 = sample2)
RaggedExperiment(grl, colData = colDat)
## class: RaggedExperiment
## dim: 5 2
## assays(1): score
## rownames(5): A B C D E
## colnames(2): sample1 sample2
## colData names(1): id
list
of GRanges
rangeList <- list(sample1 = sample1, sample2 = sample2)
RaggedExperiment(rangeList, colData = colDat)
## class: RaggedExperiment
## dim: 5 2
## assays(1): score
## rownames(5): A B C D E
## colnames(2): sample1 sample2
## colData names(1): id
List
of GRanges
with metadataNote: In cases where a SimpleGenomicRangesList
is provided along with
accompanying metadata (accessed by mcols
), the metadata is used as the
colData
for the RaggedExperiment
.
grList <- List(sample1 = sample1, sample2 = sample2)
mcols(grList) <- colDat
RaggedExperiment(grList)
## class: RaggedExperiment
## dim: 5 2
## assays(1): score
## rownames(5): A B C D E
## colnames(2): sample1 sample2
## colData names(1): id
rowRanges(ragexp)
## GRanges object with 5 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## A chr1 1-10 -
## B chr1 8-14 +
## C chr2 15-18 +
## D chr1 1-10 -
## E chr2 11-18 +
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
dimnames(ragexp)
## [[1]]
## [1] "A" "B" "C" "D" "E"
##
## [[2]]
## [1] "sample1" "sample2"
colData
colData(ragexp)
## DataFrame with 2 rows and 1 column
## id
## <integer>
## sample1 1
## sample2 2
Subsetting a RaggedExperiment
is akin to subsetting a matrix
object. Rows
correspond to genomic ranges and columns to samples or specimen. It is possible
to subset using integer
, character
, and logical
indices.
The overlapsAny
and subsetByOverlaps
functionalities are available for use
for RaggedExperiment
. Please see the corresponding documentation in
RaggedExperiment
and GenomicRanges
.
RaggedExperiment
package provides several different functions for representing
ranged data in a rectangular matrix via the *Assay
methods.
The most straightforward matrix representation of a RaggedExperiment
will
return a matrix of dimensions equal to the product of the number of ranges and
samples.
dim(ragexp)
## [1] 5 2
Reduce(`*`, dim(ragexp))
## [1] 10
sparseAssay(ragexp)
## sample1 sample2
## A 3 NA
## B 4 NA
## C 5 NA
## D NA 1
## E NA 2
length(sparseAssay(ragexp))
## [1] 10
We provide sparse matrix representations with the help of the Matrix
package.
To obtain a sparse representation, the user can use the sparse = TRUE
argument.
sparseAssay(ragexp, sparse = TRUE)
## 5 x 2 sparse Matrix of class "dgCMatrix"
## sample1 sample2
## A 3 .
## B 4 .
## C 5 .
## D . 1
## E . 2
This representation is of class dgCMatrix
see the Matrix
documentation for
more details.
Samples with identical ranges are placed in the same row. Non-disjoint ranges are not collapsed.
compactAssay(ragexp)
## sample1 sample2
## chr1:8-14:+ 4 NA
## chr1:1-10:- 3 1
## chr2:11-18:+ NA 2
## chr2:15-18:+ 5 NA
Similarly, to sparseAssay
the compactAssay
function provides the option to
obtain a sparse matrix representation with the sparse = TRUE
argument. This
will return a dgCMatrix
class from the Matrix
package.
compactAssay(ragexp, sparse = TRUE)
## 4 x 2 sparse Matrix of class "dgCMatrix"
## sample1 sample2
## chr1:8-14:+ 4 .
## chr1:1-10:- 3 1
## chr2:11-18:+ . 2
## chr2:15-18:+ 5 .
This function returns a matrix of disjoint ranges across all samples. Elements
of the matrix are summarized by applying the simplifyDisjoin
functional
argument to assay values of overlapping ranges.
disjoinAssay(ragexp, simplifyDisjoin = mean)
## sample1 sample2
## chr1:8-14:+ 4 NA
## chr1:1-10:- 3 1
## chr2:11-14:+ NA 2
## chr2:15-18:+ 5 2
The qreduceAssay
function works with a query
parameter that highlights
a window of ranges for the resulting matrix. The returned matrix will have
dimensions length(query)
by ncol(x)
. Elements contain assay values for the
i th query range and the j th sample, summarized according to the
simplifyReduce
functional argument.
For demonstration purposes, we can have a look at the original GRangesList
and the associated scores from which the current ragexp
object is derived:
unlist(grl, use.names = FALSE)
## GRanges object with 5 ranges and 1 metadata column:
## seqnames ranges strand | score
## <Rle> <IRanges> <Rle> | <integer>
## A chr1 1-10 - | 3
## B chr1 8-14 + | 4
## C chr2 15-18 + | 5
## D chr1 1-10 - | 1
## E chr2 11-18 + | 2
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
This data is represented as rowRanges
and assays
in RaggedExperiment
:
rowRanges(ragexp)
## GRanges object with 5 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## A chr1 1-10 -
## B chr1 8-14 +
## C chr2 15-18 +
## D chr1 1-10 -
## E chr2 11-18 +
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
assay(ragexp, "score")
## sample1 sample2
## A 3 NA
## B 4 NA
## C 5 NA
## D NA 1
## E NA 2
Here we provide the “query” region of interest:
(query <- GRanges(c("chr1:1-14:-", "chr2:11-18:+")))
## GRanges object with 2 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 1-14 -
## [2] chr2 11-18 +
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
The simplifyReduce
argument in qreduceAssay
allows the user to summarize
overlapping regions with a custom method for the given “query” region of
interest. We provide one for calculating a weighted average score per
query range, where the weight is proportional to the overlap widths between
overlapping ranges and a query range.
Note that there are three arguments to this function. See the documentation for additional details.
weightedmean <- function(scores, ranges, qranges)
{
isects <- pintersect(ranges, qranges)
sum(scores * width(isects)) / sum(width(isects))
}
A call to qreduceAssay
involves the RaggedExperiment
, the GRanges
query
and the simplifyReduce
functional argument.
qreduceAssay(ragexp, query, simplifyReduce = weightedmean)
## sample1 sample2
## chr1:1-14:- 3 1
## chr2:11-18:+ 5 2
See the schematic for a visual representation.
The RaggedExperiment
provides a family of parallel functions for coercing to
the SummarizedExperiment
class. By selecting a particular assay index (i
),
a parallel assay coercion method can be achieved.
Here is the list of function names:
sparseSummarizedExperiment
compactSummarizedExperiment
disjoinSummarizedExperiment
qreduceSummarizedExperiment
See the documentation for details.
In the special case where the rownames of a sparseMatrix
are coercible to
GRanges
, RaggedExperiment
provides the facility to convert sparse matrices
into RaggedExperiment
. This can be done using the as
coercion method. The
example below first creates an example sparse dgCMatrix
class and then shows
the as
method usage to this end.
library(Matrix)
##
## Attaching package: 'Matrix'
## The following object is masked from 'package:S4Vectors':
##
## expand
sm <- Matrix::sparseMatrix(
i = c(2, 3, 4, 3, 4, 3, 4),
j = c(1, 1, 1, 3, 3, 4, 4),
x = c(2L, 4L, 2L, 2L, 2L, 4L, 2L),
dims = c(4, 4),
dimnames = list(
c("chr2:1-10", "chr2:2-10", "chr2:3-10", "chr2:4-10"),
LETTERS[1:4]
)
)
as(sm, "RaggedExperiment")
## class: RaggedExperiment
## dim: 7 3
## assays(1): counts
## rownames: NULL
## colnames(3): A C D
## colData names(0):
sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] Matrix_1.7-0 RaggedExperiment_1.28.1 GenomicRanges_1.56.1
## [4] GenomeInfoDb_1.40.1 IRanges_2.38.1 S4Vectors_0.42.1
## [7] BiocGenerics_0.50.0 BiocStyle_2.32.1
##
## loaded via a namespace (and not attached):
## [1] jsonlite_1.8.8 highr_0.11
## [3] compiler_4.4.1 BiocManager_1.30.23
## [5] crayon_1.5.3 BiocBaseUtils_1.6.0
## [7] SummarizedExperiment_1.34.0 Biobase_2.64.0
## [9] jquerylib_0.1.4 yaml_2.3.9
## [11] fastmap_1.2.0 lattice_0.22-6
## [13] R6_2.5.1 XVector_0.44.0
## [15] S4Arrays_1.4.1 knitr_1.48
## [17] DelayedArray_0.30.1 bookdown_0.40
## [19] MatrixGenerics_1.16.0 GenomeInfoDbData_1.2.12
## [21] bslib_0.7.0 rlang_1.1.4
## [23] cachem_1.1.0 xfun_0.45
## [25] sass_0.4.9 SparseArray_1.4.8
## [27] cli_3.6.3 zlibbioc_1.50.0
## [29] digest_0.6.36 grid_4.4.1
## [31] lifecycle_1.0.4 evaluate_0.24.0
## [33] abind_1.4-5 rmarkdown_2.27
## [35] httr_1.4.7 matrixStats_1.3.0
## [37] tools_4.4.1 htmltools_0.5.8.1
## [39] UCSC.utils_1.0.0
Ramos, Marcel, Martin Morgan, Ludwig Geistlinger, Vincent J Carey, and Levi Waldron. 2023. “RaggedExperiment: the missing link between genomic ranges and matrices in Bioconductor.” Bioinformatics 39 (6): btad330. https://doi.org/10.1093/bioinformatics/btad330.