Contents

1 Package Status

Project Status Project Status: Active – The project has reached a stable, usable state and is being actively developed.
Travis CI Build Status
Test coverage Coverage

2 Installation

This vignette requires the SingleCellExperiment and the scRNAseq packages. To install them:

BiocInstaller::biocLite("SingleCellExperiment")
BiocInstaller::biocLite("scRNAseq")

To load the packages:

library(SingleCellExperiment)
library(scRNAseq)

3 The SingleCellExperiment class

3.1 Definition

The SingleCellExperiment class is a light-weight container for single-cell genomics data. It extends the RangedSummarizedExperiment class with the following additional slots:

  • int_elementMetadata
  • int_colData
  • int_metadata
  • reducedDims

The int_ prefix describes internal slots that are not meant for direct manipulation by the user or other package developers. Instead, they are set by other user-visible methods, which will be discussed in more detail below.

3.2 Create instances of SingleCellExperiment

There are two main ways to create instances of SingleCellExperiment. The first is via the constructor:

counts <- matrix(rpois(100, lambda = 10), ncol=10, nrow=10)
sce <- SingleCellExperiment(assays = list(counts = counts))
sce
## class: SingleCellExperiment 
## dim: 10 10 
## metadata(0):
## assays(1): counts
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(0):
## reducedDimNames(0):
## spikeNames(0):

The second is via coercion from SummarizedExperiment objects.

se <- SummarizedExperiment(list(counts=counts))
as(se, "SingleCellExperiment")
## class: SingleCellExperiment 
## dim: 10 10 
## metadata(0):
## assays(1): counts
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(0):
## reducedDimNames(0):
## spikeNames(0):

3.3 Setting up a simple example

Here we use a subset of the allen data set from the scRNAseq package to demonstrate the use of the class.

data(allen)
allen
## class: SummarizedExperiment 
## dim: 20908 379 
## metadata(2): SuppInfo which_qc
## assays(4): tophat_counts cufflinks_fpkm rsem_counts rsem_tpm
## rownames(20908): 0610007P14Rik 0610009B22Rik ... Zzef1 Zzz3
## rowData names(0):
## colnames(379): SRR2140028 SRR2140022 ... SRR2139341 SRR2139336
## colData names(22): NREADS NALIGNED ... Animal.ID passes_qc_checks_s

These data are stored as a SummarizedExperiment, so we will first coerce it into a SingleCellExperiment.

sce <- as(allen, "SingleCellExperiment")
sce
## class: SingleCellExperiment 
## dim: 20908 379 
## metadata(2): SuppInfo which_qc
## assays(4): tophat_counts cufflinks_fpkm rsem_counts rsem_tpm
## rownames(20908): 0610007P14Rik 0610009B22Rik ... Zzef1 Zzz3
## rowData names(0):
## colnames(379): SRR2140028 SRR2140022 ... SRR2139341 SRR2139336
## colData names(22): NREADS NALIGNED ... Animal.ID passes_qc_checks_s
## reducedDimNames(0):
## spikeNames(0):

4 Available methods for the SingleCellExperiment

4.1 Adding spike-in information

One of the main additions to SummarizedExperiment is the ability for the user to specify the rows corresponding to spike-in transcripts. This is done with the method isSpike, using an appropriate name for the spike-in set.

isSpike(sce, "ERCC") <- grepl("^ERCC-", rownames(sce))
sce
## class: SingleCellExperiment 
## dim: 20908 379 
## metadata(2): SuppInfo which_qc
## assays(4): tophat_counts cufflinks_fpkm rsem_counts rsem_tpm
## rownames(20908): 0610007P14Rik 0610009B22Rik ... Zzef1 Zzz3
## rowData names(0):
## colnames(379): SRR2140028 SRR2140022 ... SRR2139341 SRR2139336
## colData names(22): NREADS NALIGNED ... Animal.ID passes_qc_checks_s
## reducedDimNames(0):
## spikeNames(1): ERCC

The identities of the spike-in rows can be easily retrieved using the name of the spike-in set, as shown below. The names of currently available spike-in sets can also be returned with the spikeNames method.

table(isSpike(sce, "ERCC"))
## 
## FALSE  TRUE 
## 20816    92
spikeNames(sce)
## [1] "ERCC"

While most experimental designs use a single set of spike-ins, the class has the flexibility of including more than one set of spikes. Let us pretend that the members of the Adam gene family have been spiked-in as external genes in these data.

isSpike(sce, "Adam") <- grepl("^Adam[0-9]", rownames(sce))
sce
## class: SingleCellExperiment 
## dim: 20908 379 
## metadata(2): SuppInfo which_qc
## assays(4): tophat_counts cufflinks_fpkm rsem_counts rsem_tpm
## rownames(20908): 0610007P14Rik 0610009B22Rik ... Zzef1 Zzz3
## rowData names(0):
## colnames(379): SRR2140028 SRR2140022 ... SRR2139341 SRR2139336
## colData names(22): NREADS NALIGNED ... Animal.ID passes_qc_checks_s
## reducedDimNames(0):
## spikeNames(2): ERCC Adam
table(isSpike(sce, "Adam"))
## 
## FALSE  TRUE 
## 20875    33
spikeNames(sce)
## [1] "ERCC" "Adam"

If isSpike is used without specifying any name, it will return the union of all spike-in sets.

table(isSpike(sce))
## 
## FALSE  TRUE 
## 20783   125

Similarly, running isSpike<- without specifying any name will delete all existing spike-in sets from the SingleCellExperiment, if the assigned value is NULL. (If non-NULL, the vector will instead be stored as a spike-in set with an empty name, in addition to removing all existing sets.)

temp <- sce
isSpike(temp) <- NULL
spikeNames(temp)
## character(0)

Note that the isSpike and isSpike<- methods get and set columns in int_elementMetadata and int_metadata. This information is only relevant to package developers and not necessary for routine use of this class.

4.2 Adding size factors

One can also store size factors in the SingleCellExperiment object. For illustration, we simply compute the total number of reads as size factors here. Note that more sophisticated methods for computing size factors are available (see, e.g., scran).

sizeFactors(sce) <- colSums(assay(sce))
head(sizeFactors(sce))
## SRR2140028 SRR2140022 SRR2140055 SRR2140083 SRR2139991 SRR2140067 
##    5173863    6445002    2343379    5438526    4757468    2364851

We can compute multiple size factors and store them in the object, by providing a name to sizeFactors. This does not affect the values of the unnamed size factors.

sizeFactors(sce, "ERCC") <- colSums(assay(sce)[isSpike(sce, "ERCC"),])
head(sizeFactors(sce, "ERCC"))
## SRR2140028 SRR2140022 SRR2140055 SRR2140083 SRR2139991 SRR2140067 
##     224648     186208     162370     512991     278034      64975
head(sizeFactors(sce)) # same as before
## SRR2140028 SRR2140022 SRR2140055 SRR2140083 SRR2139991 SRR2140067 
##    5173863    6445002    2343379    5438526    4757468    2364851

4.3 Retrieve colData and rowData information

The colData and rowData methods can be used to retrieve the stored sample- and gene-level metadata. By default, this will only return the user-visible metadata fields, i.e., not including the fields stored in the int_* slots.

colData(sce)
## DataFrame with 379 rows and 22 columns
##               NREADS  NALIGNED    RALIGN TOTAL_DUP     PRIMER
##            <numeric> <numeric> <numeric> <numeric>  <numeric>
## SRR2140028  13743900  13011100   94.6681   51.1100 0.01481480
## SRR2140022  14078700  13521900   96.0454   55.9157 0.00853083
## SRR2140055   5842930   5135980   87.9008   59.1126 0.03561120
## SRR2140083  16784400  15585800   92.8587   55.3076 0.02096950
## SRR2139991  11558600  10864300   93.9929   50.2258 0.01640800
## ...              ...       ...       ...       ...        ...
## SRR2139325  12875700  11307000   87.8172   70.3564  0.0453119
## SRR2139373   9699400   8964140   92.4196   45.5249  0.0216694
## SRR2139379   6175660   5728080   92.7526   45.2652  0.0217132
## SRR2139341  28038500  26320000   93.8711   65.1959  0.0270482
## SRR2139336   7878700   7467200   94.7772   56.9675  0.0190784
##            PCT_RIBOSOMAL_BASES PCT_CODING_BASES PCT_UTR_BASES
##                      <numeric>        <numeric>     <numeric>
## SRR2140028               1e-06         0.216848      0.265609
## SRR2140022               0e+00         0.263052      0.310332
## SRR2140055               3e-06         0.207086      0.327241
## SRR2140083               1e-06         0.129243      0.253681
## SRR2139991               0e+00         0.257729      0.276831
## ...                        ...              ...           ...
## SRR2139325             1.2e-05         0.211253      0.269041
## SRR2139373             1.0e-06         0.220541      0.254625
## SRR2139379             0.0e+00         0.253996      0.289924
## SRR2139341             0.0e+00         0.297193      0.367713
## SRR2139336             1.0e-06         0.369537      0.407216
##            PCT_INTRONIC_BASES PCT_INTERGENIC_BASES PCT_MRNA_BASES
##                     <numeric>            <numeric>      <numeric>
## SRR2140028           0.369509             0.148033       0.482457
## SRR2140022           0.290329             0.136288       0.573384
## SRR2140055           0.291128             0.174542       0.534327
## SRR2140083           0.444594             0.172481       0.382924
## SRR2139991           0.323493             0.141946       0.534560
## ...                       ...                  ...            ...
## SRR2139325           0.346360             0.173333       0.480295
## SRR2139373           0.385409             0.139423       0.475167
## SRR2139379           0.325400             0.130680       0.543920
## SRR2139341           0.190673             0.144421       0.664906
## SRR2139336           0.077377             0.145870       0.776752
##            MEDIAN_CV_COVERAGE MEDIAN_5PRIME_BIAS MEDIAN_3PRIME_BIAS
##                     <numeric>          <numeric>          <numeric>
## SRR2140028           0.507749           0.141810           0.409045
## SRR2140022           0.488182           0.145024           0.419160
## SRR2140055           0.729874           0.069846           0.548560
## SRR2140083           0.781878           0.000000           0.697916
## SRR2139991           0.482920           0.160644           0.413018
## ...                       ...                ...                ...
## SRR2139325           0.754565           0.049809           0.454330
## SRR2139373           0.504545           0.162197           0.518013
## SRR2139379           0.521732           0.136669           0.458149
## SRR2139341           0.491965           0.155719           0.551382
## SRR2139336           0.517592           0.148329           0.445874
##            MEDIAN_5PRIME_TO_3PRIME_BIAS      driver_1_s dissection_s
##                               <numeric>     <character>  <character>
## SRR2140028                     0.425234  Scnn1a-Tg3-Cre           L4
## SRR2140022                     0.419260  Scnn1a-Tg3-Cre           L4
## SRR2140055                     0.257657  Scnn1a-Tg3-Cre          All
## SRR2140083                     0.018250  Scnn1a-Tg3-Cre           L4
## SRR2139991                     0.462171  Scnn1a-Tg3-Cre           L4
## ...                                 ...             ...          ...
## SRR2139325                     0.413165 Ntsr1-Cre_GN220          L6a
## SRR2139373                     0.356451 Ntsr1-Cre_GN220          L6a
## SRR2139379                     0.367889 Ntsr1-Cre_GN220          L6a
## SRR2139341                     0.330835 Ntsr1-Cre_GN220          L6a
## SRR2139336                     0.390592 Ntsr1-Cre_GN220          L6a
##               Core.Type Primary.Type Secondary.Type Animal.ID
##             <character>  <character>    <character> <integer>
## SRR2140028 Intermediate    L4 Scnn1a       L4 Ctxn3    133632
## SRR2140022         Core    L4 Scnn1a                   133632
## SRR2140055 Intermediate  L5a Tcerg1l      L5a Batf3    151560
## SRR2140083           NA           NA             NA        NA
## SRR2139991 Intermediate    L4 Scnn1a       L4 Ctxn3    126846
## ...                 ...          ...            ...       ...
## SRR2139325 Intermediate      L6a Sla        L6a Mgp    175613
## SRR2139373         Core      L6a Sla                   132856
## SRR2139379         Core      L6a Sla                   132856
## SRR2139341         Core      L6a Sla                   132856
## SRR2139336         Core      L6a Sla                   132856
##            passes_qc_checks_s
##                   <character>
## SRR2140028                  Y
## SRR2140022                  Y
## SRR2140055                  Y
## SRR2140083                  N
## SRR2139991                  Y
## ...                       ...
## SRR2139325                  Y
## SRR2139373                  Y
## SRR2139379                  Y
## SRR2139341                  Y
## SRR2139336                  Y
rowData(sce)
## DataFrame with 20908 rows and 0 columns

However, it is sometimes useful to retrieve a DataFrame with the internal fields, i.e., spike-in and size factor information. This can be achieved by specifying internal=TRUE.

colData(sce, internal=TRUE)
## DataFrame with 379 rows and 24 columns
##               NREADS  NALIGNED    RALIGN TOTAL_DUP     PRIMER
##            <numeric> <numeric> <numeric> <numeric>  <numeric>
## SRR2140028  13743900  13011100   94.6681   51.1100 0.01481480
## SRR2140022  14078700  13521900   96.0454   55.9157 0.00853083
## SRR2140055   5842930   5135980   87.9008   59.1126 0.03561120
## SRR2140083  16784400  15585800   92.8587   55.3076 0.02096950
## SRR2139991  11558600  10864300   93.9929   50.2258 0.01640800
## ...              ...       ...       ...       ...        ...
## SRR2139325  12875700  11307000   87.8172   70.3564  0.0453119
## SRR2139373   9699400   8964140   92.4196   45.5249  0.0216694
## SRR2139379   6175660   5728080   92.7526   45.2652  0.0217132
## SRR2139341  28038500  26320000   93.8711   65.1959  0.0270482
## SRR2139336   7878700   7467200   94.7772   56.9675  0.0190784
##            PCT_RIBOSOMAL_BASES PCT_CODING_BASES PCT_UTR_BASES
##                      <numeric>        <numeric>     <numeric>
## SRR2140028               1e-06         0.216848      0.265609
## SRR2140022               0e+00         0.263052      0.310332
## SRR2140055               3e-06         0.207086      0.327241
## SRR2140083               1e-06         0.129243      0.253681
## SRR2139991               0e+00         0.257729      0.276831
## ...                        ...              ...           ...
## SRR2139325             1.2e-05         0.211253      0.269041
## SRR2139373             1.0e-06         0.220541      0.254625
## SRR2139379             0.0e+00         0.253996      0.289924
## SRR2139341             0.0e+00         0.297193      0.367713
## SRR2139336             1.0e-06         0.369537      0.407216
##            PCT_INTRONIC_BASES PCT_INTERGENIC_BASES PCT_MRNA_BASES
##                     <numeric>            <numeric>      <numeric>
## SRR2140028           0.369509             0.148033       0.482457
## SRR2140022           0.290329             0.136288       0.573384
## SRR2140055           0.291128             0.174542       0.534327
## SRR2140083           0.444594             0.172481       0.382924
## SRR2139991           0.323493             0.141946       0.534560
## ...                       ...                  ...            ...
## SRR2139325           0.346360             0.173333       0.480295
## SRR2139373           0.385409             0.139423       0.475167
## SRR2139379           0.325400             0.130680       0.543920
## SRR2139341           0.190673             0.144421       0.664906
## SRR2139336           0.077377             0.145870       0.776752
##            MEDIAN_CV_COVERAGE MEDIAN_5PRIME_BIAS MEDIAN_3PRIME_BIAS
##                     <numeric>          <numeric>          <numeric>
## SRR2140028           0.507749           0.141810           0.409045
## SRR2140022           0.488182           0.145024           0.419160
## SRR2140055           0.729874           0.069846           0.548560
## SRR2140083           0.781878           0.000000           0.697916
## SRR2139991           0.482920           0.160644           0.413018
## ...                       ...                ...                ...
## SRR2139325           0.754565           0.049809           0.454330
## SRR2139373           0.504545           0.162197           0.518013
## SRR2139379           0.521732           0.136669           0.458149
## SRR2139341           0.491965           0.155719           0.551382
## SRR2139336           0.517592           0.148329           0.445874
##            MEDIAN_5PRIME_TO_3PRIME_BIAS      driver_1_s dissection_s
##                               <numeric>     <character>  <character>
## SRR2140028                     0.425234  Scnn1a-Tg3-Cre           L4
## SRR2140022                     0.419260  Scnn1a-Tg3-Cre           L4
## SRR2140055                     0.257657  Scnn1a-Tg3-Cre          All
## SRR2140083                     0.018250  Scnn1a-Tg3-Cre           L4
## SRR2139991                     0.462171  Scnn1a-Tg3-Cre           L4
## ...                                 ...             ...          ...
## SRR2139325                     0.413165 Ntsr1-Cre_GN220          L6a
## SRR2139373                     0.356451 Ntsr1-Cre_GN220          L6a
## SRR2139379                     0.367889 Ntsr1-Cre_GN220          L6a
## SRR2139341                     0.330835 Ntsr1-Cre_GN220          L6a
## SRR2139336                     0.390592 Ntsr1-Cre_GN220          L6a
##               Core.Type Primary.Type Secondary.Type Animal.ID
##             <character>  <character>    <character> <integer>
## SRR2140028 Intermediate    L4 Scnn1a       L4 Ctxn3    133632
## SRR2140022         Core    L4 Scnn1a                   133632
## SRR2140055 Intermediate  L5a Tcerg1l      L5a Batf3    151560
## SRR2140083           NA           NA             NA        NA
## SRR2139991 Intermediate    L4 Scnn1a       L4 Ctxn3    126846
## ...                 ...          ...            ...       ...
## SRR2139325 Intermediate      L6a Sla        L6a Mgp    175613
## SRR2139373         Core      L6a Sla                   132856
## SRR2139379         Core      L6a Sla                   132856
## SRR2139341         Core      L6a Sla                   132856
## SRR2139336         Core      L6a Sla                   132856
##            passes_qc_checks_s size_factor size_factor_ERCC
##                   <character>   <numeric>        <numeric>
## SRR2140028                  Y     5173863           224648
## SRR2140022                  Y     6445002           186208
## SRR2140055                  Y     2343379           162370
## SRR2140083                  N     5438526           512991
## SRR2139991                  Y     4757468           278034
## ...                       ...         ...              ...
## SRR2139325                  Y     4377966           331955
## SRR2139373                  Y     3393227            79524
## SRR2139379                  Y     2529501            37021
## SRR2139341                  Y    13972642           439580
## SRR2139336                  Y     4838243           163414
rowData(sce, internal=TRUE)
## DataFrame with 20908 rows and 3 columns
##       is_spike_ERCC  is_spike is_spike_Adam
##           <logical> <logical>     <logical>
## 1             FALSE     FALSE         FALSE
## 2             FALSE     FALSE         FALSE
## 3             FALSE     FALSE         FALSE
## 4             FALSE     FALSE         FALSE
## 5             FALSE     FALSE         FALSE
## ...             ...       ...           ...
## 20904         FALSE     FALSE         FALSE
## 20905         FALSE     FALSE         FALSE
## 20906         FALSE     FALSE         FALSE
## 20907         FALSE     FALSE         FALSE
## 20908         FALSE     FALSE         FALSE

See below for some discussion of why an internal storage mechanism is used here.

4.4 Adding low-dimensional representations

For simplicity and speed, we work on a subset of 100 genes. To avoid ending up with only uninteresting genes, we extract the 100 genes with maximal variance in the log-transformed counts.

library(magrittr)
assay(sce) %>% log1p %>% rowVars -> vars
names(vars) <- rownames(sce)
vars <- sort(vars, decreasing = TRUE)

sce_sub <- sce[names(vars[1:100]),]
sce_sub
## class: SingleCellExperiment 
## dim: 100 379 
## metadata(2): SuppInfo which_qc
## assays(4): tophat_counts cufflinks_fpkm rsem_counts rsem_tpm
## rownames(100): Lamp5 Fam19a1 ... Rnf2 Zfp35
## rowData names(0):
## colnames(379): SRR2140028 SRR2140022 ... SRR2139341 SRR2139336
## colData names(22): NREADS NALIGNED ... Animal.ID passes_qc_checks_s
## reducedDimNames(0):
## spikeNames(2): ERCC Adam

We obtain the PCA and t-SNE representations of the data and add them to the object with the reducedDims method.

library(Rtsne)
set.seed(5252)

pca_data <- prcomp(t(log1p(assay(sce_sub))))
tsne_data <- Rtsne(pca_data$x[,1:50], pca = FALSE)

reducedDims(sce_sub) <- SimpleList(PCA=pca_data$x, TSNE=tsne_data$Y)
sce_sub
## class: SingleCellExperiment 
## dim: 100 379 
## metadata(2): SuppInfo which_qc
## assays(4): tophat_counts cufflinks_fpkm rsem_counts rsem_tpm
## rownames(100): Lamp5 Fam19a1 ... Rnf2 Zfp35
## rowData names(0):
## colnames(379): SRR2140028 SRR2140022 ... SRR2139341 SRR2139336
## colData names(22): NREADS NALIGNED ... Animal.ID passes_qc_checks_s
## reducedDimNames(2): PCA TSNE
## spikeNames(2): ERCC Adam

The stored coordinates can be retrieved by name or by numerical index. Each row of the coordinate matrix is assumed to correspond to a cell, while each column represents a dimension.

reducedDims(sce_sub)
## List of length 2
## names(2): PCA TSNE
reducedDimNames(sce_sub)
## [1] "PCA"  "TSNE"
head(reducedDim(sce_sub, "PCA")[,1:2])
##                  PC1        PC2
## SRR2140028 17.557295  -7.717162
## SRR2140022 21.468975  -1.198212
## SRR2140055  4.303756 -11.360330
## SRR2140083 21.440479  -9.435868
## SRR2139991 15.592089 -11.043989
## SRR2140067 16.539336  -9.831779
head(reducedDim(sce_sub, "TSNE")[,1:2])
##                  [,1]     [,2]
## SRR2140028 -7.3054739 16.24637
## SRR2140022 -7.1231429 11.62634
## SRR2140055 -0.8316189 15.57015
## SRR2140083 -5.8758286 13.59575
## SRR2139991 -4.8157687 13.11940
## SRR2140067 -4.9579637 15.31204

Any subsetting by column of sce_sub will also lead to subsetting of the dimensionality reduction results by cell.

dim(reducedDim(sce_sub, "PCA"))
## [1] 379 100
dim(reducedDim(sce_sub[,1:10], "PCA"))
## [1]  10 100

4.5 Convenient access to named assays

In the SingleCellExperiment, users can assign arbitrary names to entries of assays. To assist interoperability between packages, we provide some suggestions for what the names should be for particular types of data:

  • counts: Raw count data, e.g., number of reads or transcripts for a particular gene.
  • normcounts: Normalized values on the same scale as the original counts. For example, counts divided by cell-specific size factors that are centred at unity.
  • logcounts: Log-transformed counts or count-like values. In most cases, this will be defined as log-transformed normcounts, e.g., using log base 2 and a pseudo-count of 1.
  • cpm: Counts-per-million. This is the read count for each gene in each cell, divided by the library size of each cell in millions.
  • tpm: Transcripts-per-million. This is the number of transcripts for each gene in each cell, divided by the total number of transcripts in that cell (in millions).

Each of these suggested names has an appropriate getter/setter method for convenient manipulation of the SingleCellExperiment. For example, we can take the (very specifically named) tophat_counts name and assign it to counts instead:

counts(sce) <- assay(sce, "tophat_counts")
sce
## class: SingleCellExperiment 
## dim: 20908 379 
## metadata(2): SuppInfo which_qc
## assays(5): tophat_counts cufflinks_fpkm rsem_counts rsem_tpm counts
## rownames(20908): 0610007P14Rik 0610009B22Rik ... Zzef1 Zzz3
## rowData names(0):
## colnames(379): SRR2140028 SRR2140022 ... SRR2139341 SRR2139336
## colData names(22): NREADS NALIGNED ... Animal.ID passes_qc_checks_s
## reducedDimNames(0):
## spikeNames(2): ERCC Adam
dim(counts(sce))
## [1] 20908   379

This means that functions expecting count data can simply call counts() without worrying about package-specific naming conventions.

5 Design decisions

5.1 Package scope

By design, the scope of this package is limited to defining the SingleCellExperiment class and some minimal getter and setter methods. For this reason, we leave it to developers of specialized packages to provide more advanced methods for the SingleCellExperiment class. For example, scater defines a number of specific methods such as normalize and dplyr-like verbs. We also leave it to the developers of existing packages to provide coercion methods from their classes to SingleCellExperiment.

5.2 Why internal storage?

We use an internal storage mechanism to protect the spike-in and size factor fields from direct manipulation by the user. This ensures that, e.g., only a call to sizeFactors<- can change the size factors. The same effect could be achieved by reserving a subset of columns (or column names) as “private” in colData() and rowData(), though this is not easily implemented.

The internal storage avoids situations where users or functions can silently overwrite these important metadata fields during manipulations of rowData or colData. This can result in bugs that are difficult to track down, particularly in long workflows involving many functions. It also allows us to add new methods and metadata types to SingleCellExperiment without worrying about overwriting user-supplied metadata in existing objects.

5.3 What’s up with reducedDims?

We use a SimpleList as the reducedDims slot to allow for multiple dimensionality reduction results. One can imagine that different dimensionality reduction techniques will be useful for different aspects of the analysis, e.g., t-SNE for visualization, PCA for pseudo-time inference. We see reducedDims as a similar slot to assays() in that multiple matrices can be stored, though the dimensionality reduction results need not have the same number of dimensions.

5.4 Why derive from a RangedSummarizedExperiment?

We decided to extend RangedSummarizedExperiment rather than SummarizedExperiment because for certain assays it will be essential to have rowRanges(). Even for RNA-seq, it is sometimes useful to have rowRanges() and other classes to define the genomic coordinates, e.g., DESeqDataSet in the DESeq2 package. An alternative would have been to have two classes, SingleCellExperiment and RangedSingleCellExperiment. However, this seems like an unnecessary duplication as having a class with default empty rowRanges seems good enough when one does not need rowRanges.

6 Session Info

sessionInfo()
## R version 3.4.2 (2017-09-28)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.3 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.6-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.6-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] Rtsne_0.13                 magrittr_1.5              
##  [3] scRNAseq_1.3.0             SingleCellExperiment_1.0.0
##  [5] SummarizedExperiment_1.8.0 DelayedArray_0.4.0        
##  [7] matrixStats_0.52.2         Biobase_2.38.0            
##  [9] GenomicRanges_1.30.0       GenomeInfoDb_1.14.0       
## [11] IRanges_2.12.0             S4Vectors_0.16.0          
## [13] BiocGenerics_0.24.0        BiocStyle_2.6.0           
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.13            knitr_1.17              XVector_0.18.0         
##  [4] zlibbioc_1.24.0         lattice_0.20-35         stringr_1.2.0          
##  [7] tools_3.4.2             grid_3.4.2              htmltools_0.3.6        
## [10] yaml_2.1.14             rprojroot_1.2           digest_0.6.12          
## [13] bookdown_0.5            Matrix_1.2-11           GenomeInfoDbData_0.99.1
## [16] bitops_1.0-6            RCurl_1.95-4.8          evaluate_0.10.1        
## [19] rmarkdown_1.6           stringi_1.1.5           compiler_3.4.2         
## [22] backports_1.1.1