1 Installation

To install the package, please use the following.

if (!requireNamespace("BiocManager"))
    install.packages("BiocManager")
BiocManager::install("miQC")

2 Introduction

This vignette provides a basic example of how to run miQC, which allows users to perform cell-wise filtering of single-cell RNA-seq data for quality control. Single-cell RNA-seq data is very sensitive to tissue quality and choice of experimental workflow; it’s critical to ensure compromised cells and failed cell libraries are removed. A high proportion of reads mapping to mitochondrial DNA is one sign of a damaged cell, so most analyses will remove cells with mtRNA over a certain threshold, but those thresholds can be arbitrary and/or detrimentally stringent, especially for archived tumor tissues. miQC jointly models both the proportion of reads mapping to mtDNA genes and the number of detected genes with mixture models in a probabilistic framework to identify the low-quality cells in a given dataset.

For more information about the statistical background of miQC and for biological use cases, consult the miQC paper (Hippen et al. 2021).

You’ll need the following packages installed to run this tutorial:

suppressPackageStartupMessages({
    library(SingleCellExperiment)
    library(scRNAseq)
    library(scater)
    library(flexmix)
    library(splines)
    library(miQC)
})

2.1 Example data

To demonstrate how to run miQC on a single-cell RNA-seq dataset, we’ll use data from mouse brain cells, originating from an experiment by Zeisel et al (Zeisel et al. 2015), and available through the Bioconductor package scRNAseq.

sce <- ZeiselBrainData()
sce
## class: SingleCellExperiment 
## dim: 20006 3005 
## metadata(0):
## assays(1): counts
## rownames(20006): Tspan12 Tshz1 ... mt-Rnr1 mt-Nd4l
## rowData names(1): featureType
## colnames(3005): 1772071015_C02 1772071017_G12 ... 1772066098_A12
##   1772058148_F03
## colData names(10): tissue group # ... level1class level2class
## reducedDimNames(0):
## mainExpName: endogenous
## altExpNames(2): ERCC repeat

2.2 Scater preprocessing

In order to calculate the percent of reads in a cell that map to mitochondrial genes, we first need to establish which genes are mitochondrial. For genes listed as HGNC symbols, this is as simple as searching for genes starting with mt-. For other IDs, we recommend using a biomaRt query to map to chromosomal location and identify all mitochondrial genes.

mt_genes <- grepl("^mt-",  rownames(sce))
feature_ctrls <- list(mito = rownames(sce)[mt_genes])

feature_ctrls
## $mito
##  [1] "mt-Tl1"  "mt-Tn"   "mt-Tr"   "mt-Tq"   "mt-Ty"   "mt-Tk"   "mt-Ta"  
##  [8] "mt-Tf"   "mt-Tp"   "mt-Tc"   "mt-Td"   "mt-Tl2"  "mt-Te"   "mt-Tv"  
## [15] "mt-Tg"   "mt-Tt"   "mt-Tw"   "mt-Tm"   "mt-Ti"   "mt-Nd3"  "mt-Nd6" 
## [22] "mt-Nd4"  "mt-Atp6" "mt-Nd2"  "mt-Nd5"  "mt-Nd1"  "mt-Co3"  "mt-Cytb"
## [29] "mt-Atp8" "mt-Co2"  "mt-Co1"  "mt-Rnr2" "mt-Rnr1" "mt-Nd4l"

miQC is designed to be run with the Bioconductor package scater, which has a built-in function addPerCellQC to calculate basic QC metrics like number of unique genes detected per cell and total number of reads. When we pass in our list of mitochondrial genes, it will also calculate percent mitochondrial reads.

sce <- addPerCellQC(sce, subsets = feature_ctrls)
head(colData(sce))
## DataFrame with 6 rows and 22 columns
##                     tissue   group # total mRNA mol      well       sex
##                <character> <numeric>      <numeric> <numeric> <numeric>
## 1772071015_C02    sscortex         1           1221         3         3
## 1772071017_G12    sscortex         1           1231        95         1
## 1772071017_A05    sscortex         1           1652        27         1
## 1772071014_B06    sscortex         1           1696        37         3
## 1772067065_H06    sscortex         1           1219        43         3
## 1772071017_E02    sscortex         1           1378         5         1
##                      age  diameter        cell_id  level1class level2class
##                <numeric> <numeric>    <character>  <character> <character>
## 1772071015_C02         2         1 1772071015_C02 interneurons       Int10
## 1772071017_G12         1       353 1772071017_G12 interneurons       Int10
## 1772071017_A05         1        13 1772071017_A05 interneurons        Int6
## 1772071014_B06         2        19 1772071014_B06 interneurons       Int10
## 1772067065_H06         6        12 1772067065_H06 interneurons        Int9
## 1772071017_E02         1        21 1772071017_E02 interneurons        Int9
##                      sum  detected subsets_mito_sum subsets_mito_detected
##                <numeric> <numeric>        <numeric>             <numeric>
## 1772071015_C02     22354      4871              774                    23
## 1772071017_G12     22869      4712             1121                    27
## 1772071017_A05     32594      6055              952                    27
## 1772071014_B06     33525      5852              611                    28
## 1772067065_H06     21694      4724              164                    23
## 1772071017_E02     25919      5427             1122                    19
##                subsets_mito_percent altexps_ERCC_sum altexps_ERCC_detected
##                           <numeric>        <numeric>             <numeric>
## 1772071015_C02             3.462468             6706                    43
## 1772071017_G12             4.901832             6435                    46
## 1772071017_A05             2.920783             6335                    47
## 1772071014_B06             1.822521             7032                    43
## 1772067065_H06             0.755969             5931                    39
## 1772071017_E02             4.328871             6671                    43
##                altexps_ERCC_percent altexps_repeat_sum altexps_repeat_detected
##                           <numeric>          <numeric>               <numeric>
## 1772071015_C02              18.0070               8181                     419
## 1772071017_G12              15.6349              11854                     480
## 1772071017_A05              11.1238              18021                     582
## 1772071014_B06              12.8999              13955                     512
## 1772067065_H06              17.1908               6876                     363
## 1772071017_E02              13.3543              17364                     618
##                altexps_repeat_percent     total
##                             <numeric> <numeric>
## 1772071015_C02                21.9677     37241
## 1772071017_G12                28.8012     41158
## 1772071017_A05                31.6435     56950
## 1772071014_B06                25.5999     54512
## 1772067065_H06                19.9299     34501
## 1772071017_E02                34.7600     49954

3 miQC

3.1 Basic usage

Using the QC metrics generated via scater, we can use the plotMetrics function to visually inspect the quality of our dataset.

plotMetrics(sce)