ExperimentSubset 1.0.0
ExperimentSubset
classExperimentSubset
classExperimentSubset
object: A toy exampleExperimentSubset
object: An example with real single cell RNA-seq dataExperimentSubset
if (!requireNamespace("BiocManager", quietly=TRUE)){
install.packages("BiocManager")}
BiocManager::install("ExperimentSubset")
To install the latest version from Github, use the following code:
library(devtools)
install_github("campbio/ExperimentSubset")
Loading the package:
library(ExperimentSubset)
Experiment objects such as the SummarizedExperiment
or SingleCellExperiment
are data containers for one or more matrix-like assays along with the associated
row and column data. Often only a subset of the original data is needed for
down-stream analysis. For example, filtering out poor quality samples will
require excluding some columns before analysis. The ExperimentSubset
object
is a container to efficiently manage different subsets of the same data without
having to make separate objects for each new subset.
ExperimentSubset
package enables users to perform flexible subsetting of
Single-Cell data that comes from the same experiment as well as the consequent
storage of these subsets back into the same object. In general, it offers the
same interface to the users as the SingleCellExperiment
container which is
one the most widely used containers for Single-Cell data. However, in addition
to the features offered by SingleCellExperiment
container, ExperimentSubset
offers subsetting features while hiding the implementation details from the
users. It does so by creating references to the subset rows
and columns
instead of storing a new assay whenever possible. Functions from
SingleCellExperiment
such as assay
, rowData
and colData
can be used for
regular assays as one would normally do, as well as with newly created subsets
of the data. This allows the users to use the ExperimentSubset
container
simply as if they were using SingleCellExperiment
container with no change
required to the existing code.
ExperimentSubset
classThe ExperimentSubset
class contains two slots, the root
slot and the
subsets
slot. The root
slot is always an experiment object inherited from
SummarizedExperiment
class, while the subsets
slot is a list
of subsets
created from the root
object.
Each subset inside the ExperimentSubset
object (more specifically inside the
subsets
slot of the object) is stored as an AssaySubset
instance. This
AssaySubset
instance creates reference to the row and column indices for this
particular subset against a parent (which can either be the root object or
another subset). In case a new assay is to be stored against a subset, it is
stored as a separate experiment object (same class as the root
) inside the
subset.
ExperimentSubset
classWhile all the common methods available with SummarizedExperiment
and
SingleCellExperiment
classes have been overridden to support the
ExperimentSubset
class with additional support for subsets, some core methods
for the creation and manipulation of subsets have been provided with the
ExperimentSubset
class.
ExperimentSubset
constructorThe constructor method allows the creation of an ExperimentSubset
object from
an input experiment object as long as it is inherited from
SummarizedExperiment
class. Additionally, if needed, a subset can be directly
created from within the constructor by providing input a named list to the
subset
parameter.
counts <- matrix(rpois(100, lambda = 10), ncol=10, nrow=10)
sce <- SingleCellExperiment(list(counts = counts))
es <- ExperimentSubset(sce)
es
## class: ExperimentSubset
## root class: SingleCellExperiment
## dim: 10 10
## metadata(0):
## assays(1): counts
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(0):
## reducedDimNames(0):
## altExpNames(0):
## subsets(0):
## subsetAssays(0):
createSubset
The createSubset
method as evident from the name, creates a subset from an
already available assay
in the object. The subsetName
(a character
string),
rowIndices
(a numeric
or character
vector
), colIndices
(a numeric
or
character
vector
) and parentAssay
(a character
string) are the standard
parameters of the createSubset
method. If rowIndices
or colIndices
are
missing
or NULL
, all of the rows or columns are selected from the specified
parentAssay
. If parentAssay
is missing
or NULL
, the first available
assay from the root
object is linked as the parent of this subset. The
parentAssay
can be an assay
in the root
object, a subset or an assay
within a subset.
The method accepts an ExperimentSubset
object or any object inherited
from SummarizedExperiment
for immediate conversion and the creation of the
subset through a single function call.
es <- createSubset(es,
subsetName = "subset1",
rows = c(1:2),
cols = c(1:5),
parentAssay = "counts")
es
## class: ExperimentSubset
## root class: SingleCellExperiment
## dim: 10 10
## metadata(0):
## assays(1): counts
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(0):
## reducedDimNames(0):
## altExpNames(0):
## subsets(1): subset1
## subsetAssays(1): subset1
storeSubset
The storeSubset
method should be used when a subset assay
needs to be stored
either in a previously created subset or a new subset. This is specifically
different from the createSubset
method which only creates a subset by
referencing to a defined parentAssay
where the internalAssay
of the subset
has no assays stored. The storeSubset
method however, is used to store an
assay
in this internalAssay
slot of the subset which in fact is a subset
experiment object of the same class as the root
object. Additionally, the
storeSubset
method can be used to directly create a subset and then store an
assay
inside this subset depending upon the parameters with which the method
is called.
subset1Assay <- assay(es, "subset1")
subset1Assay[,] <- subset1Assay[,] + 1
es <- storeSubset(es,
subsetName = "subset1",
inputMatrix = subset1Assay,
subsetAssayName = "subset1Assay")
es
## class: ExperimentSubset
## root class: SingleCellExperiment
## dim: 10 10
## metadata(0):
## assays(1): counts
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(0):
## reducedDimNames(0):
## altExpNames(0):
## subsets(1): subset1
## subsetAssays(2): subset1 subset1Assay
The parameters of interest against this method are subsetName
which specifies
the name of the subset inside which the an input assay should be stored,
inputMatrix
which is a matrix-type object to be stored as an assay inside a
subset specified by the subsetName
parameter and lastly the subsetAssayName
parameter which represents the name of the new assay. If subsetAssayName
is
set to NULL
, a new subset is created and the inputMatrix
is stored inside
the new subset.
subsetSummary
The subsetSummary
method displays an overall summary of the
ExperimentSubset
object including the assays in the root
object, the list
of subsets along with the stored assays, reduced dimensions, alt experiments
and other supplementary information that may help the users understand the
current condition of the object. The most important piece of information
displayed by this method is the hierarchical parent-subset link against each
subset in the object.
subsetSummary(es)
## Main assay(s):
## counts
##
## Subset(s):
## Name Dim Parent Assays
## 1 subset1 2, 5 counts subset1Assay
Some additional helper methods are available for the users to use during
certain circumstances such as during iteration of all subsets. These methods
include subsetNames
that returns a character
vector
of all available
subsets, subsetAssayNames
that returns a character
vector
of all available
subsets and the assays within these subsets, subsetCount
that returns the
count of the subsets, subsetAssayCount
that returns the total count of the sum
of the subsets and the assays within these subsets, subsetDim
that returns the
dimensions of a subset and lastly the subsetParent
method that returns a
character
list
of complete parent hierarchy of a subset.
ExperimentSubset
classThese are the methods that have been overridden from other classes to support
the subset feature of the ExperimentSubset
objects by introducing an
additional parameter subsetName
to these methods. These methods can simply
be called on any ExperimentSubset
object to get or set from the root
object
or from any subset by passing the optional subsetName
parameter.
The methods include assay
, assay<-
, rowData
, rowData<-
, colData
,
colData<-
, metadata
, metadata<-
, reducedDim
, reducedDim<-
,
reducedDims
, reducedDims<-
, reducedDimNames
, reducedDimNames<-
,
altExp
, altExp<-
, altExps
, altExps<-
, altExpNames
, altExpNames<-
,
rownames
, rownames<-
, colnames
and colnames<-
. All of the methods
can be used with the subsets by providing the optional subsetName
parameter.
ExperimentSubset
object: A toy exampleCreating the ExperimentSubset
object is as simple as passing an experiment
object to the ExperimentSubset
constructor:
counts <- matrix(rpois(100, lambda = 10), ncol=10, nrow=10)
sce <- SingleCellExperiment(list(counts = counts))
es <- ExperimentSubset(sce)
subsetSummary(es)
## Main assay(s):
## counts
##
## Subset(s):
## NULL
Create a subset that includes the first 5 rows and columns only:
es <- createSubset(es,
subsetName = "subset1",
rows = c(1:5),
cols = c(1:5),
parentAssay = "counts")
subsetSummary(es)
## Main assay(s):
## counts
##
## Subset(s):
## Name Dim Parent
## 1 subset1 5, 5 counts
Create another subset from subset1
by only keeping the first two rows:
es <- createSubset(es,
subsetName = "subset2",
rows = c(1:2),
cols = c(1:5),
parentAssay = "subset1")
subsetSummary(es)
## Main assay(s):
## counts
##
## Subset(s):
## Name Dim Parent
## 1 subset1 5, 5 counts
## 2 subset2 2, 5 subset1 -> counts
Get assay
from subset2
and update values:
subset2Assay <- assay(es, "subset2")
subset2Assay[,] <- subset2Assay[,] + 1
Store the updated assay
back to subset2
using one of the two approaches:
#approach 1
es <- storeSubset(es,
subsetName = "subset2",
inputMatrix = subset2Assay,
subsetAssayName = "subset2Assay_a1")
#approach 2
assay(es, "subset2", subsetAssayName = "subset2Assay_a2") <- subset2Assay
subsetSummary(es)
## Main assay(s):
## counts
##
## Subset(s):
## Name Dim Parent Assays
## 1 subset1 5, 5 counts
## 2 subset2 2, 5 subset1 -> counts c("subset2Assay_a1", "subset2Assay_a2")
Store an experiment object in the altExp
slot of subset2
:
altExp(x = es,
e = "subset2_alt1",
subsetName = "subset2") <- SingleCellExperiment(assay = list(
counts = assay(es, "subset2")
))
Show the current condition of ExperimentSubset
object:
subsetSummary(es)
## Main assay(s):
## counts
##
## Subset(s):
## Name Dim Parent Assays
## 1 subset1 5, 5 counts
## 2 subset2 2, 5 subset1 -> counts c("subset2Assay_a1", "subset2Assay_a2")
## AltExperiments
## 1
## 2 subset2_alt1
ExperimentSubset
object: An example with real single cell RNA-seq dataInstalling and loading required packages:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version = "3.11", ask = FALSE)
BiocManager::install(c("TENxPBMCData", "scater", "scran"))
library(ExperimentSubset)
library(TENxPBMCData)
library(scater)
library(scran)
Load PBMC4K dataset and create ExperimentSubset
object:
tenx_pbmc4k <- TENxPBMCData(dataset = "pbmc4k")
es <- ExperimentSubset(tenx_pbmc4k)
subsetSummary(es)
Compute perCellQCMetrics
on counts
matrix:
perCellQCMetrics <- perCellQCMetrics(assay(es, "counts"))
colData(es) <- cbind(colData(es), perCellQCMetrics)
Filter cells with low column sum and create a new subset called ‘filteredCells’:
filteredCellsIndices <- which(colData(es)$sum > 1500)
es <- createSubset(es, "filteredCells", cols = filteredCellsIndices, parentAssay = "counts")
subsetSummary(es)
Normalize ‘filteredCells’ subset using scater
library and store it back:
assay(es, "filteredCells", subsetAssayName = "filteredCellsNormalized") <- normalizeCounts(assay(es, "filteredCells"))
subsetSummary(es)
Find highly variable genes from the normalized assay in the previous step using scran
library against the ‘filteredCells’ subset only:
topHVG1000 <- getTopHVGs(modelGeneVar(assay(es, "filteredCellsNormalized")), n = 1000)
es <- createSubset(es, "hvg1000", rows = topHVG1000, parentAssay = "filteredCellsNormalized")
subsetSummary(es)
Run ‘PCA’ on the highly variable genes computed in the last step using scater
library against the ‘filteredCells’ subset only:
reducedDim(es, type = "PCA", subsetName = "hvg1000") <- calculatePCA(assay(es, "hvg1000"))
Show the current condition of the ExperimentSubset
object:
subsetSummary(es)
ExperimentSubset
ExperimentSubset
constructorcreateSubset
storeSubset
subsetSummary
subsetParent
subsetCount
subsetAssayCount
subsetNames
subsetAssayNames
subsetDim
subsetRowData
subsetColData
show
assay
assay<-
rowData
rowData<-
colData
colData<-
metadata
metadata<-
reducedDim
reducedDim<-
reducedDims
reducedDims<-
reducedDimNames
reducedDimNames<-
,altExp
altExp<-
altExps
altExps<-
altExpNames
altExpNames<-
rownames
rownames<-
colnames
colnames<-
The internal structure of an ExperimentSubset
class is described
below:
root
slotThe root
slot of an ExperimentSubset
object must be an experiment object
inherited from SummarizedExperiment
or SingleCellExperiment
and acts as the
root or the first immediate parent of any subset that is created initially. The
ExperimentSubset
object can be used in a fashion similar to
SummarizedExperiment
with all the common methods that have been overridden to
support the manipulation of ExperimentSubset
objects (with or without subsets)
including assay
, rowData
and colData
. Even though all of these methods
can be used with either ExperimentSubset
or other experiment objects directly,
the accessible root
slot offers a convenient way to manipulate the original
object if required by the user.
subsets
slotThe subsets
slot of the ExperimentSubset
object is a list
, where each
element in this list is an object of an internal AssaySubset
class. Each
element represents one subset linked to the experiment object in the root
slot. The structure of each subset is described below:
subsetName
A character
string that represents a user-defined name of the subset.
rowIndices
A numeric
vector
that stores the indices of the selected rows in the linked
parent assay within for this subset.
colIndices
A numeric
vector
that stores the indices of the selected columns in the
linked parent assay for this subset.
parentAssay
A character
string that stores the name of the immediate parent to which the
subset is linked. The parentAssay
can be an assay
in the root
slot of the
ExperimentSubset
object or any subset or any internalAssay
of a subset.
internalAssay
The internalAssay
slot stores an experiment object of same type as the root
object but with the dimensions of the subset. The internalAssay
is initially
an empty experiment object with only dimensions set to enable manipulation, but
can be used to store additional data against a subset such as assay
,
rowData
, colData
, reducedDims
, altExps
and metadata
.
sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.12-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.12-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] ExperimentSubset_1.0.0 SingleCellExperiment_1.12.0
## [3] SummarizedExperiment_1.20.0 Biobase_2.50.0
## [5] GenomicRanges_1.42.0 GenomeInfoDb_1.26.0
## [7] IRanges_2.24.0 S4Vectors_0.28.0
## [9] BiocGenerics_0.36.0 MatrixGenerics_1.2.0
## [11] matrixStats_0.57.0 BiocStyle_2.18.0
##
## loaded via a namespace (and not attached):
## [1] knitr_1.30 XVector_0.30.0 magrittr_1.5
## [4] zlibbioc_1.36.0 lattice_0.20-41 rlang_0.4.8
## [7] stringr_1.4.0 tools_4.0.3 grid_4.0.3
## [10] xfun_0.18 htmltools_0.5.0 yaml_2.2.1
## [13] digest_0.6.27 bookdown_0.21 Matrix_1.2-18
## [16] GenomeInfoDbData_1.2.4 BiocManager_1.30.10 bitops_1.0-6
## [19] RCurl_1.98-1.2 evaluate_0.14 rmarkdown_2.5
## [22] DelayedArray_0.16.0 stringi_1.5.3 compiler_4.0.3