SingleR 1.0.6
SingleR is an automatic annotation method for single-cell RNA sequencing (scRNAseq) data (Aran et al. 2019). Given a reference dataset of samples (single-cell or bulk) with known labels, it labels new cells from a test dataset based on similarity to the reference set. Specifically, for each test cell:
Automatic annotation provides a convenient way of transferring biological knowledge across datasets. In this manner, the burden of manually interpreting clusters and defining marker genes only has to be done once, for the reference dataset, and this knowledge can be propagated to new datasets in an automated manner.
SingleR provides several reference datasets (mostly derived from bulk RNA-seq or microarray data) through dedicated data retrieval functions.
For example, we obtain reference data from the Human Primary Cell Atlas using the HumanPrimaryCellAtlasData()
function,
which returns a SummarizedExperiment
object containing matrix of log-expression values with sample-level labels.
library(SingleR)
hpca.se <- HumanPrimaryCellAtlasData()
hpca.se
## class: SummarizedExperiment
## dim: 19363 713
## metadata(0):
## assays(1): logcounts
## rownames(19363): A1BG A1BG-AS1 ... ZZEF1 ZZZ3
## rowData names(0):
## colnames(713): GSM112490 GSM112491 ... GSM92233 GSM92234
## colData names(2): label.main label.fine
Our test dataset will is taken from La Manno et al. (2016).
For the sake of speed, we will only label the first 100 cells from this dataset.
library(scRNAseq)
hESCs <- LaMannoBrainData('human-es')
hESCs <- hESCs[,1:100]
# SingleR() expects log-counts, but the function will also happily take raw
# counts for the test dataset. The reference, however, must have log-values.
library(scater)
hESCs <- logNormCounts(hESCs)
We use our hpca.se
reference to annotate each cell in hESCs
via the SingleR()
function, which uses the algorithm described above.
Note that the default marker detection method is to take the genes with the largest positive log-fold changes in the per-label medians for each gene.
pred.hesc <- SingleR(test = hESCs, ref = hpca.se, labels = hpca.se$label.main)
pred.hesc
## DataFrame with 100 rows and 5 columns
## scores
## <matrix>
## 1772122_301_C02 0.118426779945786:0.179699807625087:0.157326274226517:...
## 1772122_180_E05 0.129708246318855:0.236277439793527:0.202370888668263:...
## 1772122_300_H02 0.158201338525345:0.250060222727419:0.211831550178353:...
## 1772122_180_B09 0.158778546217777:0.27716592787528:0.222681369744636:...
## 1772122_180_G04 0.138505219642345:0.236658649096383:0.19092437361406:...
## ... ...
## 1772122_299_E07 0.145931041885859:0.241153701803065:0.217382763112476:...
## 1772122_180_D02 0.122983434596168:0.239181076829949:0.181221997276501:...
## 1772122_300_D09 0.129757310468164:0.233775092572195:0.196637664917917:...
## 1772122_298_F09 0.143118885460347:0.262267367714562:0.214329641867196:...
## 1772122_302_A11 0.0912854247387272:0.185945405472165:0.139232371863794:...
## first.labels tuning.scores
## <character> <DataFrame>
## 1772122_301_C02 Neuroepithelial_cell 0.18244020296249:0.0991115652997192
## 1772122_180_E05 Neuroepithelial_cell 0.137548373236792:0.0647133734667384
## 1772122_300_H02 Neuroepithelial_cell 0.275798157639906:0.136969040146444
## 1772122_180_B09 Neuroepithelial_cell 0.0851622797320583:0.0819878452425098
## 1772122_180_G04 Neuroepithelial_cell 0.198841544187094:0.101662168246495
## ... ... ...
## 1772122_299_E07 Neuroepithelial_cell 0.176002520599547:0.0922503823656398
## 1772122_180_D02 Neuroepithelial_cell 0.196760862365318:0.112480486219438
## 1772122_300_D09 Neuroepithelial_cell 0.0816424287822026:0.0221368018363302
## 1772122_298_F09 Neuroepithelial_cell 0.187249853552379:0.0671892835266423
## 1772122_302_A11 Neuroepithelial_cell 0.156079956344163:0.105132159755961
## labels pruned.labels
## <character> <character>
## 1772122_301_C02 Neuroepithelial_cell Neuroepithelial_cell
## 1772122_180_E05 Neurons Neurons
## 1772122_300_H02 Neuroepithelial_cell Neuroepithelial_cell
## 1772122_180_B09 Neuroepithelial_cell Neuroepithelial_cell
## 1772122_180_G04 Neuroepithelial_cell Neuroepithelial_cell
## ... ... ...
## 1772122_299_E07 Neuroepithelial_cell Neuroepithelial_cell
## 1772122_180_D02 Neuroepithelial_cell Neuroepithelial_cell
## 1772122_300_D09 Neuroepithelial_cell Neuroepithelial_cell
## 1772122_298_F09 Neuroepithelial_cell Neuroepithelial_cell
## 1772122_302_A11 Astrocyte Astrocyte
Each row of the output DataFrame
contains prediction results for a single cell.
Labels are shown before fine-tuning (first.labels
), after fine-tuning (labels
) and after pruning (pruned.labels
), along with the associated scores.
We summarize the distribution of labels across our subset of cells:
table(pred.hesc$labels)
##
## Astrocyte Neuroepithelial_cell Neurons
## 14 81 5
At this point, it is worth noting that SingleR is workflow/package agnostic.
The above example uses SummarizedExperiment
objects, but the same functions will accept any (log-)normalized expression matrix.
Here, we will use two human pancreas datasets from the scRNAseq package. The aim is to use one pre-labelled dataset to annotate the other unlabelled dataset. First, we set up the Muraro et al. (2016) dataset to be our reference.
library(scRNAseq)
sceM <- MuraroPancreasData()
# One should normally do cell-based quality control at this point, but for
# brevity's sake, we will just remove the unlabelled libraries here.
sceM <- sceM[,!is.na(sceM$label)]
sceM <- logNormCounts(sceM)
We then set up our test dataset from Grun et al. (2016). To speed up this demonstration, we will subset to the first 100 cells.
sceG <- GrunPancreasData()
sceG <- sceG[,colSums(counts(sceG)) > 0] # Remove libraries with no counts.
sceG <- logNormCounts(sceG)
sceG <- sceG[,1:100]
We then run SingleR()
as described previously but with a marker detection mode that considers the variance of expression across cells.
Here, we will use the Wilcoxon ranked sum test to identify the top markers for each pairwise comparison between labels.
This is slower but more appropriate for single-cell data compared to the default marker detection algorithm (which may fail for low-coverage data where the median is frequently zero).
pred.grun <- SingleR(test=sceG, ref=sceM, labels=sceM$label, de.method="wilcox")
table(pred.grun$labels)
##
## acinar beta delta duct endothelial
## 53 4 1 41 1
SingleR provides a few basic yet powerful visualization tools.
plotScoreHeatmap()
displays the scores for all cells across all reference labels,
which allows users to inspect the confidence of the predicted labels across the dataset.
The actual assigned label for each cell is shown in the color bar at the top;
note that this may not be the visually top-scoring label if fine-tuning is applied, as the only the pre-tuned scores are directly comparable across all labels.
plotScoreHeatmap(pred.grun)
For this plot, the key point is to examine the spread of scores within each cell. Ideally, each cell (i.e., column of the heatmap) should have one score that is obviously larger than the rest, indicating that it is unambiguously assigned to a single label. A spread of similar scores for a given cell indicates that the assignment is uncertain, though this may be acceptable if the uncertainty is distributed across similar cell types that cannot be easily resolved.
We can also display other metadata information for each cell by setting clusters=
or annotation_col=
.
This is occasionally useful for examining potential batch effects, differences in cell type composition between conditions, relationship to clusters from an unsupervised analysis, etc.
In the code below, we display which donor each cell comes from:
plotScoreHeatmap(pred.grun,
annotation_col=as.data.frame(colData(sceG)[,"donor",drop=FALSE]))
The pruneScores()
function will remove potentially poor-quality or ambiguous assignments.
In particular, ambiguous assignments are identified based on the per-cell “delta”, i.e., the difference between the score for the assigned label and the median across all labels for each cell.
Low deltas indicate that the assignment is uncertain, which is especially relevant if the cell’s true label does not exist in the reference.
The exact threshold used for pruning is identified using an outlier-based approach that accounts for differences in the scale of the correlations in various contexts.
to.remove <- pruneScores(pred.grun)
summary(to.remove)
## Mode FALSE TRUE
## logical 96 4
By default, SingleR()
will report pruned labels in the pruned.labels
field where low-quality assignments are replaced with NA
.
However, the default pruning thresholds may not be appropriate for every dataset - see ?pruneScores
for a more detailed discussion.
We provide the plotScoreDistribution()
to help in determining whether the thresholds are appropriate by using information across cells with the same label.
This displays the per-label distribution of the deltas across cells, from which pruneScores()
defines an appropriate threshold as 3 median absolute deviations (MADs) below the median.
plotScoreDistribution(pred.grun, show = "delta.med", ncol = 3, show.nmads = 3)
If some tuning parameters must be adjusted, we can simply call pruneScores()
directly with adjusted parameters.
Here, we set labels to NA
if they are to be discarded, which is also how SingleR()
marks such labels in pruned.labels
.
new.pruned <- pred.grun$labels
new.pruned[pruneScores(pred.grun, nmads=5)] <- NA
table(new.pruned, useNA="always")
## new.pruned
## acinar beta delta duct endothelial <NA>
## 53 4 1 41 1 0
Another simple yet effective diagnostic is to examine the expression of the marker genes for each label in the test dataset.
We extract the identity of the markers from the metadata of the SingleR()
results and use them in the plotHeatmap()
function from scater, as shown below for beta cell markers.
If a cell in the test dataset is confidently assigned to a particular label, we would expect it to have strong expression of that label’s markers.
At the very least, it should exhibit upregulation of those markers relative to cells assigned to other labels.
all.markers <- metadata(pred.grun)$de.genes
sceG$labels <- pred.grun$labels
# Beta cell-related markers
plotHeatmap(sceG, order_columns_by="labels",
features=unique(unlist(all.markers$beta)))
We can similarly perform this for all labels by wrapping this code in a loop, as shown below:
for (lab in unique(pred.grun$labels)) {
plotHeatmap(sceG, order_columns_by=list(I(pred.grun$labels)),
features=unique(unlist(all.markers[[lab]])))
}
Heatmaps are particularly useful because they allow users to check that the genes are actually biologically meaningful to that cell type’s identity. For example, beta cells would be expected to express insulin, and the fact that they do so gives more confidence to the correctness of the assignment. By comparison, the scores and deltas are more abstract and difficult to interpret for diagnostic purposes. If the identified markers are not meaningful or not consistently upregulated, some skepticism towards the quality of the assignments is warranted.
The legacy SingleR package provides RDA files that contain normalized expression values and cell types labels based on bulk RNA-seq, microarray and single-cell RNA-seq data from:
The bulk RNA-seq and microarray data sets of the first three reference data sets were obtained from pre-sorted cell populations, i.e., the cell labels of these samples were mostly derived based on the respective sorting/purification strategy, not via in silico prediction methods.
Three additional reference datasets from bulk RNA-seq and microarray data for immune cells have also been prepared. Each of these datasets were also obtained from pre-sorted cell populations:
The characteristics of each dataset are summarized below:
Data retrieval | Organism | Samples | Sample types | No. of main labels | No. of fine labels | Cell type focus |
---|---|---|---|---|---|---|
HumanPrimaryCellAtlasData() |
human | 713 | microarrays of sorted cell populations | 37 | 157 | Non-specific |
BlueprintEncodeData() |
human | 259 | RNA-seq | 24 | 43 | Non-specific |
DatabaseImmuneCellExpressionData() |
human | 1561 | RNA-seq | 5 | 15 | Immune |
NovershternHematopoieticData() |
human | 211 | microarrays of sorted cell populations | 17 | 38 | Hematopoietic & Immune |
MonacoImmuneData() |
human | 114 | RNA-seq | 11 | 29 | Immune |
ImmGenData() |
mouse | 830 | microarrays of sorted cell populations | 20 | 253 | Hematopoietic & Immune |
MouseRNAseqData() |
mouse | 358 | RNA-seq | 18 | 28 | Non-specific |
Details for each dataset can be viewed on the corresponding help page (e.g. ?ImmGenData
).
The available sample types in each set can be viewed in the collapsible sections below.
BlueprintEncodeData
Labels
label.main | label.fine |
---|---|
Neutrophils | Neutrophils |
Monocytes | Monocytes |
HSC | MEP |
CD4+ T-cells | CD4+ T-cells |
CD4+ T-cells | Tregs |
CD4+ T-cells | CD4+ Tcm |
CD4+ T-cells | CD4+ Tem |
CD8+ T-cells | CD8+ Tcm |
CD8+ T-cells | CD8+ Tem |
NK cells | NK cells |
B-cells | naive B-cells |
B-cells | Memory B-cells |
B-cells | Class-switched memory B-cells |
HSC | HSC |
HSC | MPP |
HSC | CLP |
HSC | GMP |
Macrophages | Macrophages |
CD8+ T-cells | CD8+ T-cells |
Erythrocytes | Erythrocytes |
HSC | Megakaryocytes |
HSC | CMP |
Macrophages | Macrophages M1 |
Macrophages | Macrophages M2 |
Endothelial cells | Endothelial cells |
DC | DC |
Eosinophils | Eosinophils |
B-cells | Plasma cells |
Chondrocytes | Chondrocytes |
Fibroblasts | Fibroblasts |
Smooth muscle | Smooth muscle |
Epithelial cells | Epithelial cells |
Melanocytes | Melanocytes |
Skeletal muscle | Skeletal muscle |
Keratinocytes | Keratinocytes |
Endothelial cells | mv Endothelial cells |
Myocytes | Myocytes |
Adipocytes | Adipocytes |
Neurons | Neurons |
Pericytes | Pericytes |
Adipocytes | Preadipocytes |
Adipocytes | Astrocytes |
Mesangial cells | Mesangial cells |
HumanPrimaryCellAtlasData
Labels
label.main | label.fine |
---|---|
DC | DC:monocyte-derived:immature |
DC | DC:monocyte-derived:Galectin-1 |
DC | DC:monocyte-derived:LPS |
DC | DC:monocyte-derived |
Smooth_muscle_cells | Smooth_muscle_cells:bronchial:vit_D |
Smooth_muscle_cells | Smooth_muscle_cells:bronchial |
Epithelial_cells | Epithelial_cells:bronchial |
B_cell | B_cell |
Neutrophils | Neutrophil |
T_cells | T_cell:CD8+_Central_memory |
T_cells | T_cell:CD8+ |
T_cells | T_cell:CD4+ |
T_cells | T_cell:CD8+_effector_memory_RA |
T_cells | T_cell:CD8+_effector_memory |
T_cells | T_cell:CD8+_naive |
Monocyte | Monocyte |
Erythroblast | Erythroblast |
BM & Prog. | BM |
DC | DC:monocyte-derived:rosiglitazone |
DC | DC:monocyte-derived:AM580 |
DC | DC:monocyte-derived:rosiglitazone/AGN193109 |
DC | DC:monocyte-derived:anti-DC-SIGN_2h |
Endothelial_cells | Endothelial_cells:HUVEC |
Endothelial_cells | Endothelial_cells:HUVEC:Borrelia_burgdorferi |
Endothelial_cells | Endothelial_cells:HUVEC:IFNg |
Endothelial_cells | Endothelial_cells:lymphatic |
Endothelial_cells | Endothelial_cells:HUVEC:Serum_Amyloid_A |
Endothelial_cells | Endothelial_cells:lymphatic:TNFa_48h |
T_cells | T_cell:effector |
T_cells | T_cell:CCR10+CLA+1,25(OH)2_vit_D3/IL-12 |
T_cells | T_cell:CCR10-CLA+1,25(OH)2_vit_D3/IL-12 |
Gametocytes | Gametocytes:spermatocyte |
DC | DC:monocyte-derived:A._fumigatus_germ_tubes_6h |
Neurons | Neurons:ES_cell-derived_neural_precursor |
Keratinocytes | Keratinocytes |
Keratinocytes | Keratinocytes:IL19 |
Keratinocytes | Keratinocytes:IL20 |
Keratinocytes | Keratinocytes:IL22 |
Keratinocytes | Keratinocytes:IL24 |
Keratinocytes | Keratinocytes:IL26 |
Keratinocytes | Keratinocytes:KGF |
Keratinocytes | Keratinocytes:IFNg |
Keratinocytes | Keratinocytes:IL1b |
HSC_-G-CSF | HSC_-G-CSF |
DC | DC:monocyte-derived:mature |
Monocyte | Monocyte:anti-FcgRIIB |
Macrophage | Macrophage:monocyte-derived:IL-4/cntrl |
Macrophage | Macrophage:monocyte-derived:IL-4/Dex/cntrl |
Macrophage | Macrophage:monocyte-derived:IL-4/Dex/TGFb |
Macrophage | Macrophage:monocyte-derived:IL-4/TGFb |
Monocyte | Monocyte:leukotriene_D4 |
NK_cell | NK_cell |
NK_cell | NK_cell:IL2 |
Embryonic_stem_cells | Embryonic_stem_cells |
Tissue_stem_cells | Tissue_stem_cells:iliac_MSC |
Chondrocytes | Chondrocytes:MSC-derived |
Osteoblasts | Osteoblasts |
Tissue_stem_cells | Tissue_stem_cells:BM_MSC |
Osteoblasts | Osteoblasts:BMP2 |
Tissue_stem_cells | Tissue_stem_cells:BM_MSC:BMP2 |
Tissue_stem_cells | Tissue_stem_cells:BM_MSC:TGFb3 |
DC | DC:monocyte-derived:Poly(IC) |
DC | DC:monocyte-derived:CD40L |
DC | DC:monocyte-derived:Schuler_treatment |
DC | DC:monocyte-derived:antiCD40/VAF347 |
Tissue_stem_cells | Tissue_stem_cells:dental_pulp |
T_cells | T_cell:CD4+_central_memory |
T_cells | T_cell:CD4+_effector_memory |
T_cells | T_cell:CD4+_Naive |
Smooth_muscle_cells | Smooth_muscle_cells:vascular |
Smooth_muscle_cells | Smooth_muscle_cells:vascular:IL-17 |
BM | BM |
Platelets | Platelets |
Epithelial_cells | Epithelial_cells:bladder |
Macrophage | Macrophage:monocyte-derived |
Macrophage | Macrophage:monocyte-derived:M-CSF |
Macrophage | Macrophage:monocyte-derived:M-CSF/IFNg |
Macrophage | Macrophage:monocyte-derived:M-CSF/Pam3Cys |
Macrophage | Macrophage:monocyte-derived:M-CSF/IFNg/Pam3Cys |
Macrophage | Macrophage:monocyte-derived:IFNa |
Gametocytes | Gametocytes:oocyte |
Monocyte | Monocyte:F._tularensis_novicida |
Endothelial_cells | Endothelial_cells:HUVEC:B._anthracis_LT |
B_cell | B_cell:Germinal_center |
B_cell | B_cell:Plasma_cell |
B_cell | B_cell:Naive |
B_cell | B_cell:Memory |
DC | DC:monocyte-derived:AEC-conditioned |
Tissue_stem_cells | Tissue_stem_cells:lipoma-derived_MSC |
Tissue_stem_cells | Tissue_stem_cells:adipose-derived_MSC_AM3 |
Endothelial_cells | Endothelial_cells:HUVEC:FPV-infected |
Endothelial_cells | Endothelial_cells:HUVEC:PR8-infected |
Endothelial_cells | Endothelial_cells:HUVEC:H5N1-infected |
Macrophage | Macrophage:monocyte-derived:S._aureus |
Fibroblasts | Fibroblasts:foreskin |
iPS_cells | iPS_cells:skin_fibroblast-derived |
iPS_cells | iPS_cells:skin_fibroblast |
T_cells | T_cell:gamma-delta |
Monocyte | Monocyte:CD14+ |
Macrophage | Macrophage:Alveolar |
Macrophage | Macrophage:Alveolar:B._anthacis_spores |
Neutrophils | Neutrophil:inflam |
iPS_cells | iPS_cells:PDB_fibroblasts |
iPS_cells | iPS_cells:PDB_1lox-17Puro-5 |
iPS_cells | iPS_cells:PDB_1lox-17Puro-10 |
iPS_cells | iPS_cells:PDB_1lox-21Puro-20 |
iPS_cells | iPS_cells:PDB_1lox-21Puro-26 |
iPS_cells | iPS_cells:PDB_2lox-5 |
iPS_cells | iPS_cells:PDB_2lox-22 |
iPS_cells | iPS_cells:PDB_2lox-21 |
iPS_cells | iPS_cells:PDB_2lox-17 |
iPS_cells | iPS_cells:CRL2097_foreskin |
iPS_cells | iPS_cells:CRL2097_foreskin-derived:d20_hepatic_diff |
iPS_cells | iPS_cells:CRL2097_foreskin-derived:undiff. |
B_cell | B_cell:CXCR4+_centroblast |
B_cell | B_cell:CXCR4-_centrocyte |
Endothelial_cells | Endothelial_cells:HUVEC:VEGF |
iPS_cells | iPS_cells:fibroblasts |
iPS_cells | iPS_cells:fibroblast-derived:Direct_del._reprog |
iPS_cells | iPS_cells:fibroblast-derived:Retroviral_transf |
Endothelial_cells | Endothelial_cells:lymphatic:KSHV |
Endothelial_cells | Endothelial_cells:blood_vessel |
Monocyte | Monocyte:CD16- |
Monocyte | Monocyte:CD16+ |
Tissue_stem_cells | Tissue_stem_cells:BM_MSC:osteogenic |
Hepatocytes | Hepatocytes |
Neutrophils | Neutrophil:uropathogenic_E._coli_UTI89 |
Neutrophils | Neutrophil:commensal_E._coli_MG1655 |
MSC | MSC |
Neuroepithelial_cell | Neuroepithelial_cell:ESC-derived |
Astrocyte | Astrocyte:Embryonic_stem_cell-derived |
Endothelial_cells | Endothelial_cells:HUVEC:IL-1b |
HSC_CD34+ | HSC_CD34+ |
CMP | CMP |
GMP | GMP |
B_cell | B_cell:immature |
MEP | MEP |
Myelocyte | Myelocyte |
Pre-B_cell_CD34- | Pre-B_cell_CD34- |
Pro-B_cell_CD34+ | Pro-B_cell_CD34+ |
Pro-Myelocyte | Pro-Myelocyte |
Smooth_muscle_cells | Smooth_muscle_cells:umbilical_vein |
iPS_cells | iPS_cells:foreskin_fibrobasts |
iPS_cells | iPS_cells:iPS:minicircle-derived |
iPS_cells | iPS_cells:adipose_stem_cells |
iPS_cells | iPS_cells:adipose_stem_cell-derived:lentiviral |
iPS_cells | iPS_cells:adipose_stem_cell-derived:minicircle-derived |
Fibroblasts | Fibroblasts:breast |
Monocyte | Monocyte:MCSF |
Monocyte | Monocyte:CXCL4 |
Neurons | Neurons:adrenal_medulla_cell_line |
Tissue_stem_cells | Tissue_stem_cells:CD326-CD56+ |
NK_cell | NK_cell:CD56hiCD62L+ |
T_cells | T_cell:Treg:Naive |
Neutrophils | Neutrophil:LPS |
Neutrophils | Neutrophil:GM-CSF_IFNg |
Monocyte | Monocyte:S._typhimurium_flagellin |
Neurons | Neurons:Schwann_cell |
DatabaseImmuneCellExpressionData
Labels
label.main | label.fine |
---|---|
B cells | B cells, naive |
Monocytes | Monocytes, CD14+ |
Monocytes | Monocytes, CD16+ |
NK cells | NK cells |
T cells, CD4+ | T cells, CD4+, memory TREG |
T cells, CD4+ | T cells, CD4+, naive |
T cells, CD4+ | T cells, CD4+, naive, stimulated |
T cells, CD4+ | T cells, CD4+, naive TREG |
T cells, CD4+ | T cells, CD4+, TFH |
T cells, CD4+ | T cells, CD4+, Th1 |
T cells, CD4+ | T cells, CD4+, Th1_17 |
T cells, CD4+ | T cells, CD4+, Th17 |
T cells, CD4+ | T cells, CD4+, Th2 |
T cells, CD8+ | T cells, CD8+, naive |
T cells, CD8+ | T cells, CD8+, naive, stimulated |
NovershternHematopoieticData
Labels
label.main | label.fine | |
---|---|---|
Basophils | Basophils | Basophils |
Naïve B cells | B cells | Naïve B cells |
Mature B cells class able to switch | B cells | Mature B cells class able to switch |
Mature B cells | B cells | Mature B cells |
Mature B cells class switched | B cells | Mature B cells class switched |
Common myeloid progenitors | CMPs | Common myeloid progenitors |
Plasmacytoid Dendritic Cells | Dendritic cells | Plasmacytoid Dendritic Cells |
Myeloid Dendritic Cells | Dendritic cells | Myeloid Dendritic Cells |
Eosinophils | Eosinophils | Eosinophils |
Erythroid_CD34+ CD71+ GlyA- | Erythroid cells | Erythroid_CD34+ CD71+ GlyA- |
Erythroid_CD34- CD71+ GlyA- | Erythroid cells | Erythroid_CD34- CD71+ GlyA- |
Erythroid_CD34- CD71+ GlyA+ | Erythroid cells | Erythroid_CD34- CD71+ GlyA+ |
Erythroid_CD34- CD71lo GlyA+ | Erythroid cells | Erythroid_CD34- CD71lo GlyA+ |
Erythroid_CD34- CD71- GlyA+ | Erythroid cells | Erythroid_CD34- CD71- GlyA+ |
Granulocyte/monocyte progenitors | GMPs | Granulocyte/monocyte progenitors |
Colony Forming Unit-Granulocytes | Granulocytes | Colony Forming Unit-Granulocytes |
Granulocytes (Neutrophilic Metamyelocytes) | Granulocytes | Granulocytes (Neutrophilic Metamyelocytes) |
Granulocytes (Neutrophils) | Granulocytes | Granulocytes (Neutrophils) |
Hematopoietic stem cells_CD133+ CD34dim | HSCs | Hematopoietic stem cells_CD133+ CD34dim |
Hematopoietic stem cells_CD38- CD34+ | HSCs | Hematopoietic stem cells_CD38- CD34+ |
Colony Forming Unit-Megakaryocytic | Megakaryocytes | Colony Forming Unit-Megakaryocytic |
Megakaryocytes | Megakaryocytes | Megakaryocytes |
Megakaryocyte/erythroid progenitors | MEPs | Megakaryocyte/erythroid progenitors |
Colony Forming Unit-Monocytes | Monocytes | Colony Forming Unit-Monocytes |
Monocytes | Monocytes | Monocytes |
Mature NK cells_CD56- CD16+ CD3- | NK cells | Mature NK cells_CD56- CD16+ CD3- |
Mature NK cells_CD56+ CD16+ CD3- | NK cells | Mature NK cells_CD56+ CD16+ CD3- |
Mature NK cells_CD56- CD16- CD3- | NK cells | Mature NK cells_CD56- CD16- CD3- |
NK T cells | NK T cells | NK T cells |
Early B cells | B cells | Early B cells |
Pro B cells | B cells | Pro B cells |
CD8+ Effector Memory RA | CD8+ T cells | CD8+ Effector Memory RA |
Naive CD8+ T cells | CD8+ T cells | Naive CD8+ T cells |
CD8+ Effector Memory | CD8+ T cells | CD8+ Effector Memory |
CD8+ Central Memory | CD8+ T cells | CD8+ Central Memory |
Naive CD4+ T cells | CD4+ T cells | Naive CD4+ T cells |
CD4+ Effector Memory | CD4+ T cells | CD4+ Effector Memory |
CD4+ Central Memory | CD4+ T cells | CD4+ Central Memory |
MonacoImmuneData
Labels
label.main | label.fine | |
---|---|---|
Naive CD8 T cells | CD8+ T cells | Naive CD8 T cells |
Central memory CD8 T cells | CD8+ T cells | Central memory CD8 T cells |
Effector memory CD8 T cells | CD8+ T cells | Effector memory CD8 T cells |
Terminal effector CD8 T cells | CD8+ T cells | Terminal effector CD8 T cells |
MAIT cells | T cells | MAIT cells |
Vd2 gd T cells | T cells | Vd2 gd T cells |
Non-Vd2 gd T cells | T cells | Non-Vd2 gd T cells |
Follicular helper T cells | CD4+ T cells | Follicular helper T cells |
T regulatory cells | CD4+ T cells | T regulatory cells |
Th1 cells | CD4+ T cells | Th1 cells |
Th1/Th17 cells | CD4+ T cells | Th1/Th17 cells |
Th17 cells | CD4+ T cells | Th17 cells |
Th2 cells | CD4+ T cells | Th2 cells |
Naive CD4 T cells | CD4+ T cells | Naive CD4 T cells |
Progenitor cells | Progenitors | Progenitor cells |
Naive B cells | B cells | Naive B cells |
Non-switched memory B cells | B cells | Non-switched memory B cells |
Exhausted B cells | B cells | Exhausted B cells |
Switched memory B cells | B cells | Switched memory B cells |
Plasmablasts | B cells | Plasmablasts |
Classical monocytes | Monocytes | Classical monocytes |
Intermediate monocytes | Monocytes | Intermediate monocytes |
Non classical monocytes | Monocytes | Non classical monocytes |
Natural killer cells | NK cells | Natural killer cells |
Plasmacytoid dendritic cells | Dendritic cells | Plasmacytoid dendritic cells |
Myeloid dendritic cells | Dendritic cells | Myeloid dendritic cells |
Low-density neutrophils | Neutrophils | Low-density neutrophils |
Low-density basophils | Basophils | Low-density basophils |
Terminal effector CD4 T cells | CD4+ T cells | Terminal effector CD4 T cells |
ImmGenData
Labels
label.main | label.fine |
---|---|
Macrophages | Macrophages (MF.11C-11B+) |
Macrophages | Macrophages (MF.ALV) |
Monocytes | Monocytes (MO.6+I-) |
Monocytes | Monocytes (MO.6+2+) |
B cells | B cells (B.MEM) |
B cells | B cells (B1A) |
DC | DC (DC.11B+) |
DC | DC (DC.11B-) |
Stromal cells | Stromal cells (DN.CFA) |
Stromal cells | Stromal cells (DN) |
Eosinophils | Eosinophils (EO) |
Fibroblasts | Fibroblasts (FRC.CAD11.WT) |
Fibroblasts | Fibroblasts (FRC.CFA) |
Fibroblasts | Fibroblasts (FRC) |
Neutrophils | Neutrophils (GN) |
Endothelial cells | Endothelial cells (LEC.CFA) |
Endothelial cells | Endothelial cells (LEC) |
Macrophages | Macrophages (MF) |
T cells | T cells (T.DP.69-) |
T cells | T cells (T.DP) |
T cells | T cells (T.DP69+) |
Macrophages | Macrophages (MF.F480HI.GATA6KO) |
Macrophages | Macrophages (MF.F480HI.CTRL) |
T cells | T cells (T.CD4.1H) |
T cells | T cells (T.CD4.24H) |
T cells | T cells (T.CD4.48H) |
T cells | T cells (T.CD4.5H) |
T cells | T cells (T.CD4.96H) |
T cells | T cells (T.CD4.CTR) |
T cells | T cells (T.CD8.1H) |
T cells | T cells (T.CD8.24H) |
T cells | T cells (T.CD8.48H) |
T cells | T cells (T.CD8.5H) |
T cells | T cells (T.CD8.96H) |
T cells | T cells (T.CD8.CTR) |
Macrophages | Macrophages (MFAR-) |
Monocytes | Monocytes (MO) |
ILC | ILC (ILC1.CD127+) |
ILC | ILC (LIV.ILC1.DX5-) |
ILC | ILC (LPL.NCR+ILC1) |
ILC | ILC (ILC2) |
ILC | ILC (LPL.NCR+ILC3) |
ILC | ILC (ILC3.LTI.CD4+) |
ILC | ILC (ILC3.LTI.CD4-) |
ILC | ILC (ILC3.LTI.4+) |
NK cells | NK cells (NK.CD127-) |
ILC | ILC (LIV.NK.DX5+) |
ILC | ILC (LPL.NCR+CNK) |
Basophils | Basophils (BA) |
Epithelial cells | Epithelial cells (Ep.5wk.MEC.Sca1+) |
Epithelial cells | Epithelial cells (Ep.5wk.MEChi) |
Epithelial cells | Epithelial cells (Ep.5wk.MEClo) |
Epithelial cells | Epithelial cells (Ep.8wk.CEC.Sca1+) |
Epithelial cells | Epithelial cells (Ep.8wk.CEChi) |
Epithelial cells | Epithelial cells (Ep.8wk.MEChi) |
Epithelial cells | Epithelial cells (Ep.8wk.MEClo) |
Mast cells | Mast cells (MC.ES) |
Mast cells | Mast cells (MC) |
Mast cells | Mast cells (MC.TO) |
Mast cells | Mast cells (MC.TR) |
Mast cells | Mast cells (MC.DIGEST) |
Epithelial cells | Epithelial cells (MECHI.GFP+.ADULT) |
Epithelial cells | Epithelial cells (MECHI.GFP+.ADULT.KO) |
Epithelial cells | Epithelial cells (MECHI.GFP-.ADULT) |
Macrophages | Macrophages (MF.480HI.NAIVE) |
Macrophages | Macrophages (MF.480INT.NAIVE) |
T cells | T cells (T.4EFF49D+11A+.D8.LCMV) |
T cells | T cells (T.4MEM49D+11A+.D30.LCMV) |
T cells | T cells (T.4NVE44-49D-11A-) |
T cells | T cells (T.8EFF.TBET+.OT1LISOVA) |
T cells | T cells (T.8EFF.TBET-.OT1LISOVA) |
T cells | T cells (T.8EFFKLRG1+CD127-.D8.LISOVA) |
T cells | T cells (T.8MEMKLRG1-CD127+.D8.LISOVA) |
T cells | T cells (T.4+8int) |
T cells | T cells (T.4FP3+25+) |
T cells | T cells (T.4int8+) |
T cells | T cells (T.4SP24-) |
T cells | T cells (T.4SP24int) |
T cells | T cells (T.4SP69+) |
T cells | T cells (T.8SP24-) |
T cells | T cells (T.8SP24int) |
T cells | T cells (T.8SP69+) |
T cells | T cells (T.DPbl) |
T cells | T cells (T.DPsm) |
T cells | T cells (T.ISP) |
B cells | B cells (B.FrE) |
B cells | B cells (B.FrF) |
B cells | B cells (preB.FrD) |
B cells | B cells (proB.FrBC) |
B cells | B cells (preB.FrC) |
Stem cells | Stem cells (SC.STSL) |
T cells | T cells (T.CD4+TESTNA) |
T cells | T cells (T.CD4+TESTDB) |
B cells | B cells (B.CD19CONTROL) |
T cells | T cells (T.CD4CONTROL) |
T cells | T cells (T.CD4TESTJS) |
T cells | T cells (T.CD4TESTCJ) |
Stem cells | Stem cells (SC.CD150-CD48-) |
Tgd | Tgd (Tgd.imm.vg2+) |
Tgd | Tgd (Tgd.imm.vg2) |
Tgd | Tgd (Tgd.mat.vg3) |
Tgd | Tgd (Tgd.mat.vg3.) |
Tgd | Tgd (Tgd) |
Tgd | Tgd (Tgd.vg2+.act) |
Tgd | Tgd (Tgd.vg2-.act) |
Tgd | Tgd (Tgd.vg2-) |
B cells | B cells (B.Fo) |
B cells | B cells (B.FRE) |
B cells | B cells (B.GC) |
B cells | B cells (B.MZ) |
B cells | B cells (B.T1) |
B cells | B cells (B.T2) |
B cells | B cells (B.T3) |
B cells | B cells (B1a) |
B cells | B cells (B1b) |
DC | DC (DC) |
DC | DC (DC.103+11B-) |
DC | DC (DC.8-4-11B+) |
DC | DC (DC.LC) |
NK cells | NK cells (NK.49CI+) |
NK cells | NK cells (NK.49CI-) |
NK cells | NK cells (NK.B2M-) |
NK cells | NK cells (NK.DAP10-) |
NK cells | NK cells (NK.DAP12-) |
NK cells | NK cells (NK.H+.MCMV1) |
NK cells | NK cells (NK.H+.MCMV7) |
NK cells | NK cells (NK.H+MCMV1) |
NK cells | NK cells (NK.MCMV7) |
NK cells | NK cells (NK) |
NKT | NKT (NKT.4+) |
NKT | NKT (NKT.4-) |
NKT | NKT (NKT.44+NK1.1+) |
NKT | NKT (NKT.44+NK1.1-) |
NKT | NKT (NKT.44-NK1.1-) |
B cells | B cells (preB.FRD) |
B cells | B cells (proB.CLP) |
Stem cells | Stem cells (proB.CLP) |
B cells | B cells (proB.FrA) |
B cells | B cells (proB.FRA) |
B cells, pro | B cells, pro (proB.FrA) |
T cells | T cells (T.4MEM) |
T cells | T cells (T.4Mem) |
T cells | T cells (T.4MEM44H62L) |
T cells | T cells (T.4Nve) |
T cells | T cells (T.4NVE) |
T cells | T cells (T.8EFF.OT1.D15.VSVOVA) |
T cells | T cells (T.8EFF.OT1.D5.VSVOVA) |
T cells | T cells (T.8EFF.OT1.VSVOVA) |
T cells | T cells (T.8EFF.OT1.D8.VSVOVA) |
T cells | T cells (T.8MEM) |
T cells | T cells (T.8Mem) |
T cells | T cells (T.8MEM.OT1.D106.VSVOVA) |
T cells | T cells (T.8EFF.OT1.D45VSV) |
T cells | T cells (T.8Nve) |
T cells | T cells (T.8NVE) |
B cells | B cells (proB.FRBC) |
T cells | T cells (T.4) |
T cells | T cells (T.4.Pa) |
T cells | T cells (T.4.PLN) |
T cells | T cells (T.4FP3-) |
Tgd | Tgd (Tgd.VG2+) |
Tgd | Tgd (Tgd.vg2+.TCRbko) |
Tgd | Tgd (Tgd.vg2-.TCRbko) |
Tgd | Tgd (Tgd.vg5+.act) |
Tgd | Tgd (Tgd.VG5+.ACT) |
Tgd | Tgd (Tgd.VG5+) |
Tgd | Tgd (Tgd.vg5-.act) |
Tgd | Tgd (Tgd.VG5-) |
NK cells | NK cells (NK.49H+) |
NK cells | NK cells (NK.49H-) |
DC | DC (DC.8+) |
DC | DC (DC.8-) |
DC | DC (DC.8-4-11B-) |
DC | DC (DC.PDC.8+) |
DC | DC (DC.PDC.8-) |
Macrophages | Macrophages (MF.II-480HI) |
Macrophages | Macrophages (MF.RP) |
Macrophages | Macrophages (MFIO5.II+480INT) |
Macrophages | Macrophages (MFIO5.II+480LO) |
Macrophages | Macrophages (MFIO5.II-480HI) |
Macrophages | Macrophages (MFIO5.II-480INT) |
Monocytes | Monocytes (MO.6C+II+) |
Monocytes | Monocytes (MO.6C+II-) |
Monocytes | Monocytes (MO.6C-II+) |
Monocytes | Monocytes (MO.6C-II-) |
Monocytes | Monocytes (MO.6C-IIINT) |
T cells | T cells (T.8EFF.OT1.D10LIS) |
T cells | T cells (T.8EFF.OT1.D10.LISOVA) |
T cells | T cells (T.8EFF.OT1.D15LIS) |
T cells | T cells (T.8EFF.OT1.D15.LISOVA) |
T cells | T cells (T.8EFF.OT1LISO) |
T cells | T cells (T.8EFF.OT1.LISOVA) |
T cells | T cells (T.8EFF.OT1.D8LISO) |
T cells | T cells (T.8EFF.OT1.D8.LISOVA) |
T cells | T cells (T.8MEM.OT1.D100.LISOVA) |
T cells | T cells (T.8MEM.OT1.D45.LISOVA) |
T cells | T cells (T.8NVE.OT1) |
B cells | B cells (B.FO) |
Endothelial cells | Endothelial cells (BEC) |
Epithelial cells | Epithelial cells (EP.MECHI) |
Fibroblasts | Fibroblasts (FI.MTS15+) |
Fibroblasts | Fibroblasts (FI) |
Stromal cells | Stromal cells (ST.31-38-44-) |
Stem cells | Stem cells (SC.LT34F) |
Stem cells | Stem cells (SC.MDP) |
Stem cells | Stem cells (SC.MEP) |
Stem cells | Stem cells (SC.MPP34F) |
Stem cells | Stem cells (SC.ST34F) |
Stem cells | Stem cells (SC.CDP) |
Stem cells | Stem cells (SC.CMP.DR) |
Stem cells | Stem cells (GMP) |
Stem cells | Stem cells (MLP) |
Stem cells | Stem cells (LTHSC) |
T cells | T cells (T.DN2-3) |
T cells | T cells (T.DN2) |
T cells | T cells (T.DN2A) |
T cells | T cells (T.DN2B) |
T cells | T cells (T.DN3-4) |
T cells | T cells (T.DN3A) |
T cells | T cells (T.DN3B) |
T cells | T cells (T.DN1-2) |
T cells | T cells (T.DN4) |
Macrophages | Macrophages (MF.103-11B+.SALM3) |
Macrophages | Macrophages (MF.103-11B+) |
DC | DC (DC.103-11B+24+) |
Macrophages | Macrophages (MF.103-11B+24-) |
DC | DC (DC.103-11B+F4-80LO.KD) |
Macrophages | Macrophages (MF.11CLOSER.SALM3) |
Macrophages | Macrophages (MF.11CLOSER) |
Macrophages | Macrophages (MF.103CLOSER) |
Macrophages | Macrophages (MF.II+480LO) |
Neutrophils | Neutrophils (GN.ARTH) |
Neutrophils | Neutrophils (GN.Thio) |
Neutrophils | Neutrophils (GN.URAC) |
Macrophages | Macrophages (MF.169+11CHI) |
Macrophages | Macrophages (MF.MEDL) |
Macrophages | Macrophages (MF.SBCAPS) |
Microglia | Microglia (Microglia) |
T cells | T cells (T.ETP) |
Tgd | Tgd (Tgd.imm.VG1+) |
Tgd | Tgd (Tgd.imm.VG1+VD6+) |
Tgd | Tgd (Tgd.mat.VG1+) |
Tgd | Tgd (Tgd.mat.VG1+VD6+) |
Tgd | Tgd (Tgd.mat.VG2+) |
Tgd | Tgd (Tgd.VG3+24AHI) |
Tgd | Tgd (Tgd.VG5+24AHI) |
T cells | T cells (T.8EFF.OT1.12HR.LISOVA) |
T cells | T cells (T.8EFF.OT1.24HR.LISOVA) |
T cells | T cells (T.8EFF.OT1.48HR.LISOVA) |
T cells | T cells (T.Tregs) |
Tgd | Tgd (Tgd.VG2+24AHI) |
Tgd | Tgd (Tgd.VG4+24AHI) |
Tgd | Tgd (Tgd.VG4+24ALO) |
MouseRNAseqData
Labels
label.main | label.fine |
---|---|
Adipocytes | Adipocytes |
Neurons | aNSCs |
Astrocytes | Astrocytes |
Astrocytes | Astrocytes activated |
Endothelial cells | Endothelial cells |
Erythrocytes | Erythrocytes |
Fibroblasts | Fibroblasts |
Fibroblasts | Fibroblasts activated |
Fibroblasts | Fibroblasts senescent |
Granulocytes | Granulocytes |
Macrophages | Macrophages |
Microglia | Microglia |
Microglia | Microglia activated |
Monocytes | Monocytes |
Neurons | Neurons |
Neurons | Neurons activated |
NK cells | NK cells |
Neurons | NPCs |
Oligodendrocytes | Oligodendrocytes |
Neurons | qNSCs |
T cells | T cells |
Dendritic cells | Dendritic cells |
Cardiomyocytes | Cardiomyocytes |
Hepatocytes | Hepatocytes |
B cells | B cells |
Epithelial cells | Ependymal |
Oligodendrocytes | OPCs |
Macrophages | Macrophages activated |
Single-cell reference datasets provide a like-for-like comparison to our test datasets, yielding a more accurate classification of the cells in the latter (hopefully). However, there are frequently many more samples in single-cell references compared to bulk references, increasing the computational work involved in classification. We avoid this by aggregating cells into one “pseudo-bulk” sample per label (e.g., by averaging across log-expression values) and using those as the reference, which allows us to achieve the same efficiency as the use of bulk references.
The obvious cost of this approach is that we discard potentially useful information about the distribution of cells within each label. Cells that belong to a heterogeneous population may not be correctly assigned if they are far from the population center. We attempt to preserve some of this information by using \(k\)-means clustering within each cell to create pseudo-bulk samples that are representative of a particular region of the expression space (i.e., vector quantization). We create \(\sqrt{N}\) clusters given a label with \(N\) cells, which provides a reasonable compromise between reducing computational work and preserving the label’s internal distribution.
This aggregation approach is implemented in the aggregateReferences
function, which is shown in action below for the Muraro et al. (2016) dataset.
The function returns a SummarizedExperiment
object containing the pseudo-bulk expression profiles and the corresponding labels.
set.seed(100) # for the k-means step.
aggr <- aggregateReference(sceM, labels=sceM$label)
aggr
## class: SummarizedExperiment
## dim: 19059 116
## metadata(0):
## assays(1): logcounts
## rownames(19059): A1BG-AS1__chr19 A1BG__chr19 ... ZZEF1__chr17
## ZZZ3__chr1
## rowData names(0):
## colnames(116): alpha.1 alpha.2 ... mesenchymal.8 epsilon.1
## colData names(1): label
The resulting SummarizedExperiment
can then be used as a reference in SingleR()
.
pred.aggr <- SingleR(sceG, aggr, labels=aggr$label)
table(pred.aggr$labels)
##
## acinar beta delta duct
## 52 4 1 43
In some cases, we may wish to use multiple references for annotation of a test dataset.
This yield a more comprehensive set of cell types that are not covered by any individual reference, especially when differences in resolution are also considered.
Use of multiple references is supported by simply passing multiple objects to the ref=
and label=
argument in SingleR()
.
We demonstrate below by including another reference (from Blueprint-Encode) in our annotation of the La Manno et al. (2016) dataset:
bp.se <- BlueprintEncodeData()
pred.combined <- SingleR(test = hESCs,
ref = list(BP=bp.se, HPCA=hpca.se),
labels = list(bp.se$label.main, hpca.se$label.main))
The output is the same form as previously described, and we can easily gain access to the combined set of labels:
table(pred.combined$labels)
##
## Astrocyte Neuroepithelial_cell Neurons
## 4 63 33
Our strategy is to perform annotation on each reference separately and then take the highest-scoring label across references.
This provides a light-weight approach to combining information from multiple references while avoiding batch effects and the need for up-front harmonization.
(Of course, the main practical difficulty of this approach is that the same cell type may have different labels across references, which will require some implicit harmonization during interpretation.)
Further comments on the justification behind the choice of this method can be found at ?combineResults
.
The matchReferences()
function provides a simple yet elegant approach for label harmonization between two references.
Each reference is used to annotate the other and the probability of mutual assignment between each pair of labels is computed.
Probabilities close to 1 indicate there is a 1:1 relation between that pair of labels;
on the other hand, an all-zero probability vector indicates that a label is unique to a particular reference.
matched <- matchReferences(bp.se, hpca.se,
bp.se$label.main, hpca.se$label.main)
pheatmap::pheatmap(matched, col=viridis::plasma(100))
A heatmap like the one above can be used to guide harmonization to enforce a consistent vocabulary across all labels representing the same cell type or state.
The most obvious benefit of harmonization is that interpretation of the results is simplified.
However, an even more important effect is that the presence of harmonized labels from multiple references allows the classification machinery to protect against irrelevant batch effects between references.
For example, in SingleR()
’s case, marker genes are favored if they are consistently upregulated across multiple references, improving robustness to technical idiosyncrasies in any test dataset.
We stress that some manual intervention is still required in this process, given the risks posed by differences in biological systems and technologies. For example, neurons are considered unique to each reference while smooth muscle cells in the HPCA data are incorrectly matched to fibroblasts in the Blueprint/ENCODE data. CD4+ and CD8+ T cells are also both assigned to “T cells”, so some decision about the acceptable resolution of the harmonized labels is required here.
As an aside, we can also use this function to identify the matching clusters between two independent scRNA-seq analyses. This is an “off-label” use that involves substituting the cluster assignments as proxies for the labels. We can then match up clusters and integrate conclusions from multiple datasets without the difficulty of batch correction and reclustering.
Advanced users can split the SingleR()
workflow into two separate training and classification steps.
This means that training (e.g., marker detection, assembling of nearest-neighbor indices) only needs to be performed once.
The resulting data structures can then be re-used across multiple classifications with different test datasets, provided the test feature set is identical to or a superset of the features in the training set.
For example:
common <- intersect(rownames(hESCs), rownames(hpca.se))
trained <- trainSingleR(hpca.se[common,], labels=hpca.se$label.main)
pred.hesc2 <- classifySingleR(hESCs[common,], trained)
table(pred.hesc$labels, pred.hesc2$labels)
##
## Astrocyte Neuroepithelial_cell Neurons
## Astrocyte 14 0 0
## Neuroepithelial_cell 0 81 0
## Neurons 0 0 5
Other efficiency improvements are possible through several arguments:
trainSingleR()
via the BNPARAM=
argument from the BiocNeighbors package.classifySingleR()
with the BPPARAM=
argument from the BiocParallel package.These arguments can also be specified in the SingleR()
command.
Users can also construct their own marker lists with any DE testing machinery. For example, we can perform pairwise \(t\)-tests using methods from scran and obtain the top 10 marker genes from each pairwise comparison.
library(scran)
out <- pairwiseTTests(logcounts(sceM), sceM$label, direction="up")
markers <- getTopMarkers(out$statistics, out$pairs, n=10)
We then supply these genes to SingleR()
directly via the genes=
argument.
A more focused gene set also allows annotation to be performed more quickly compared to the default approach.
pred.grun2 <- SingleR(test=sceG, ref=sceM, labels=sceM$label, genes=markers)
table(pred.grun2$labels)
##
## acinar beta delta duct pp unclear
## 59 4 1 34 1 1
In some cases, markers may only be available for specific labels rather than for pairwise comparisons between labels.
This is accommodated by supplying a named list of character vectors to genes
.
Note that this is likely to be less powerful than the list-of-lists approach as information about pairwise differences is discarded.
label.markers <- lapply(markers, unlist, recursive=FALSE)
pred.grun3 <- SingleR(test=sceG, ref=sceM, labels=sceM$label, genes=label.markers)
table(pred.grun$labels, pred.grun3$labels)
##
## acinar beta delta duct pp
## acinar 51 0 0 2 0
## beta 0 4 0 0 0
## delta 0 0 1 0 0
## duct 2 0 0 39 0
## endothelial 0 0 0 0 1
How can I use this with my Seurat
, SingleCellExperiment
, or cell_data_set
object?
SingleR is workflow agnostic - all it needs is normalized counts. An example showing how to map its results back to common single-cell data objects is available in the README.
Where can I find reference sets appropriate for my data?
scRNAseq contains many single-cell datasets with more continually being added. ArrayExpress and GEOquery can be used to download any of the bulk or single-cell datasets in ArrayExpress or GEO, respectively.
sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.10-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.10-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] scran_1.14.6 knitr_1.28
## [3] scater_1.14.6 ggplot2_3.3.0
## [5] scRNAseq_2.0.2 SingleCellExperiment_1.8.0
## [7] SingleR_1.0.6 SummarizedExperiment_1.16.1
## [9] DelayedArray_0.12.2 BiocParallel_1.20.1
## [11] matrixStats_0.56.0 Biobase_2.46.0
## [13] GenomicRanges_1.38.0 GenomeInfoDb_1.22.1
## [15] IRanges_2.20.2 S4Vectors_0.24.3
## [17] BiocGenerics_0.32.0 BiocStyle_2.14.4
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-6 bit64_0.9-7
## [3] RColorBrewer_1.1-2 httr_1.4.1
## [5] tools_3.6.3 R6_2.4.1
## [7] irlba_2.3.3 vipor_0.4.5
## [9] DBI_1.1.0 colorspace_1.4-1
## [11] withr_2.1.2 gridExtra_2.3
## [13] tidyselect_1.0.0 bit_1.1-15.2
## [15] curl_4.3 compiler_3.6.3
## [17] cli_2.0.2 BiocNeighbors_1.4.2
## [19] labeling_0.3 bookdown_0.18
## [21] scales_1.1.0 rappdirs_0.3.1
## [23] stringr_1.4.0 digest_0.6.25
## [25] rmarkdown_2.1 XVector_0.26.0
## [27] pkgconfig_2.0.3 htmltools_0.4.0
## [29] highr_0.8 limma_3.42.2
## [31] dbplyr_1.4.2 fastmap_1.0.1
## [33] rlang_0.4.5 RSQLite_2.2.0
## [35] shiny_1.4.0.2 DelayedMatrixStats_1.8.0
## [37] farver_2.0.3 dplyr_0.8.5
## [39] RCurl_1.98-1.1 magrittr_1.5
## [41] BiocSingular_1.2.2 GenomeInfoDbData_1.2.2
## [43] Matrix_1.2-18 Rcpp_1.0.4
## [45] ggbeeswarm_0.6.0 munsell_0.5.0
## [47] fansi_0.4.1 viridis_0.5.1
## [49] lifecycle_0.2.0 edgeR_3.28.1
## [51] stringi_1.4.6 yaml_2.2.1
## [53] zlibbioc_1.32.0 BiocFileCache_1.10.2
## [55] AnnotationHub_2.18.0 grid_3.6.3
## [57] blob_1.2.1 dqrng_0.2.1
## [59] promises_1.1.0 ExperimentHub_1.12.0
## [61] crayon_1.3.4 lattice_0.20-41
## [63] magick_2.3 locfit_1.5-9.4
## [65] pillar_1.4.3 igraph_1.2.5
## [67] glue_1.4.0 BiocVersion_3.10.1
## [69] evaluate_0.14 BiocManager_1.30.10
## [71] vctrs_0.2.4 httpuv_1.5.2
## [73] gtable_0.3.0 purrr_0.3.3
## [75] assertthat_0.2.1 xfun_0.12
## [77] rsvd_1.0.3 mime_0.9
## [79] xtable_1.8-4 later_1.0.0
## [81] viridisLite_0.3.0 pheatmap_1.0.12
## [83] tibble_3.0.0 AnnotationDbi_1.48.0
## [85] beeswarm_0.2.3 memoise_1.1.0
## [87] statmod_1.4.34 ellipsis_0.3.0
## [89] interactiveDisplayBase_1.24.0
Aran, D., A. P. Looney, L. Liu, E. Wu, V. Fong, A. Hsu, S. Chak, et al. 2019. “Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage.” Nat. Immunol. 20 (2):163–72.
Benayoun, Bérénice A., Elizabeth A. Pollina, Param Priya Singh, Salah Mahmoudi, Itamar Harel, Kerriann M. Casey, Ben W. Dulken, Anshul Kundaje, and Anne Brunet. 2019. “Remodeling of epigenome and transcriptome landscapes with aging in mice reveals widespread induction of inflammatory responses.” Genome Research 29:697–709. https://doi.org/10.1101/gr.240093.118.
Grun, D., M. J. Muraro, J. C. Boisset, K. Wiebrands, A. Lyubimova, G. Dharmadhikari, M. van den Born, et al. 2016. “De Novo Prediction of Stem Cell Identity using Single-Cell Transcriptome Data.” Cell Stem Cell 19 (2):266–77.
Heng, Tracy S.P., Michio W. Painter, Kutlu Elpek, Veronika Lukacs-Kornek, Nora Mauermann, Shannon J. Turley, Daphne Koller, et al. 2008. “The immunological genome project: Networks of gene expression in immune cells.” Nature Immunology 9 (10):1091–4. https://doi.org/10.1038/ni1008-1091.
La Manno, G., D. Gyllborg, S. Codeluppi, K. Nishimura, C. Salto, A. Zeisel, L. E. Borm, et al. 2016. “Molecular Diversity of Midbrain Development in Mouse, Human, and Stem Cells.” Cell 167 (2):566–80.
Mabbott, Neil A., J. K. Baillie, Helen Brown, Tom C. Freeman, and David A. Hume. 2013. “An expression atlas of human primary cells: Inference of gene function from coexpression networks.” BMC Genomics 14. https://doi.org/10.1186/1471-2164-14-632.
Martens, Joost H A, and Hendrik G. Stunnenberg. 2013. “BLUEPRINT: Mapping human blood cell epigenomes.” Haematologica 98:1487–9. https://doi.org/10.3324/haematol.2013.094243.
Monaco, Gianni, Bernett Lee, Weili Xu, Seri Mustafah, You Yi Hwang, Christophe Carré, Nicolas Burdin, et al. 2019. “RNA-Seq Signatures Normalized by mRNA Abundance Allow Absolute Deconvolution of Human Immune Cell Types.” Cell Reports 26 (6):1627–1640.e7. https://doi.org/10.1016/j.celrep.2019.01.041.
Muraro, M. J., G. Dharmadhikari, D. Grun, N. Groen, T. Dielen, E. Jansen, L. van Gurp, et al. 2016. “A Single-Cell Transcriptome Atlas of the Human Pancreas.” Cell Syst 3 (4):385–94.
Novershtern, Noa, Aravind Subramanian, Lee N. Lawton, Raymond H. Mak, W. Nicholas Haining, Marie E. McConkey, Naomi Habib, et al. 2011. “Densely Interconnected Transcriptional Circuits Control Cell States in Human Hematopoiesis.” Cell 144 (2):296–309. https://doi.org/10.1016/j.cell.2011.01.004.
Schmiedel, Benjamin J., Divya Singh, Ariel Madrigal, Alan G. Valdovino-Gonzalez, Brandie M. White, Jose Zapardiel-Gonzalo, Brendan Ha, et al. 2018. “Impact of Genetic Polymorphisms on Human Immune Cell Gene Expression.” Cell 175 (6):1701–1715.e16. https://doi.org/10.1016/j.cell.2018.10.022.
The ENCODE Project Consortium. 2012. “An integrated encyclopedia of DNA elements in the human genome.” Nature. https://doi.org/10.1038/nature11247.