scp 1.14.0
scp
packageThe scp
package is used to process and analyse mass spectrometry
(MS)-based single cell proteomics (SCP) data. The functions rely on a
specific data structure that wraps
QFeatures
objects (Gatto and Vanderaa (2023)) around
SingleCellExperiment
objects (Amezquita et al. (2020)). This data structure could be seen as
Matryoshka dolls were the SingleCellExperiment
objects are small
dolls contained in the bigger QFeatures
doll.
The SingleCellExperiment
class provides a dedicated framework for
single-cell data. The SingleCellExperiment
serves as an interface to
many cutting-edge methods for processing, visualizing and analysis
single-cell data. More information about the SingleCellExperiment
class and associated methods can be found in the
OSCA book.
The QFeatures
class is a data framework dedicated to manipulate and
process MS-based quantitative data. It preserves the relationship
between the different levels of information: peptide to spectrum match
(PSM) data, peptide data and protein data. The QFeatures
package
also provides an interface to many utility functions to streamline the
processing MS data. More information about MS data analysis tools can
be found in the
RforMassSpectrometry project.
Before running the vignette we need to load the scp
package.
library("scp")
We also load ggplot2
and dplyr
for convenient data
manipulation and plotting.
library("ggplot2")
library("dplyr")
This vignette will guide you through some common steps of mass
spectrometry-based single-cell proteomics (SCP) data analysis. SCP is
an emerging field and further research is required to develop a
principled analysis workflow. Therefore, we do not guarantee that the
steps presented here are the best steps for this type of data analysis.
This vignette performs the steps that were described in the SCoPE2
landmark paper (Specht et al. (2021)) and that were reproduced in another
work using the scp
package (Vanderaa and Gatto (2021)). The replication on
the full SCoPE2 dataset using scp
is available in
this vignette.
We hope to convince the reader that,
although the workflow is probably not optimal, scp
has the full
potential to perform standardized and principled data analysis. All
functions presented here are comprehensively documented, highly modular,
can easily be extended with new algorithms. Suggestions, feature
requests or bug reports are warmly welcome. Feel free to open an issue
in the
GitHub repository.
This workflow can be applied to any MS-based SCP data. The minimal requirement to follow this workflow is that the data should contain the following information:
runCol
/Raw.file
: field in the feature data and the sample data
that gives the names of MS acquisition runs or files.quantCols
: field in the sample data that links to columns in the
quantification data and that allows to link samples to MS channels
(more details in another
vignette).SampleType
: field in the sample data that provides the type of
sample that is acquired (carrier, reference, single-cell,…). Only
needed for multiplexing experiments.Potential.contaminant
: field in the feature data that marks
contaminant peptides.Reverse
: field in the feature data that marks reverse peptides.PIF
: field in the feature data that provides spectral purity.PEP
or dart_PEP
: field in the feature data that provides peptide
posterior error probabilities.Modified.sequence
: field in the feature data that provides the
peptide identifiers.Leading.razor.protein
: field in the feature data that provides the
protein identifiers.Reporter.intensity.
followed by an index (1
to 16
).Each required field will be described more in detail in the corresponding sections. Names can be adapted by the user to more meaningful ones or adapted to other output tables.
The first step is to read in the PSM quantification table generated
by, for example, MaxQuant (Tyanova, Temu, and Cox (2016)). We created a small
example data by subsetting the MaxQuant evidence.txt
table provided
in the SCoPE2 landmark paper (Specht et al. (2021)). The mqScpData
table
is a typical example of what you would get after reading in a CSV file
using read.csv
or read.table
. See ?mqScpData
for more
information about the table content.
data("mqScpData")
We also provide an example of a sample annotation table that provides
useful information about the samples that are present in the example
data. See ?sampleAnnotation
for more information about the table
content.
data("sampleAnnotation")
As a note, the example sample data contains 5 different types of samples
(SampleType
) that can be found in a TMT-based SCP data set:
table(sampleAnnotation$SampleType)
#>
#> Blank Carrier Macrophage Monocyte Reference Unused
#> 19 3 20 5 3 14
Carrier
) contain 200 cell equivalents and
are meant to boost the peptide identification rate.Reference
) contain 5 cell equivalents
and are used to partially correct for between-run variation.Unused
) are channels that are left empty due
to isotopic cross-contamination by the carrier channel.Blank
) contain samples that do not
contain any cell but are processed as single-cell samples.Macrophage
) or monocyte
(Monocyte
).Using readSCP
, we combine both tables in a QFeatures
object
formatted as described above.
scp <- readSCP(assayData = mqScpData,
colData = sampleAnnotation,
runCol = "Raw.file",
removeEmptyCols = TRUE)
#> Checking arguments.
#> Loading data as a 'SummarizedExperiment' object.
#> Splitting data in runs.
#> Formatting sample annotations (colData).
#> Formatting data as a 'QFeatures' object.
scp
#> An instance of class QFeatures containing 4 assays:
#> [1] 190222S_LCA9_X_FP94BM: SingleCellExperiment with 395 rows and 11 columns
#> [2] 190321S_LCA10_X_FP97_blank_01: SingleCellExperiment with 109 rows and 11 columns
#> [3] 190321S_LCA10_X_FP97AG: SingleCellExperiment with 487 rows and 11 columns
#> [4] 190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 370 rows and 16 columns
See here that the 3 first assays contain 11 columns that correspond to the TMT-11 labels and the last assay contains 16 columns that correspond to the TMT-16 labels.
Important: More details about the usage of readSCP()
and how to
read your own data set are provided in the Load data using readSCP
vignette.
Another way to get an overview of the scp object is to plot the
QFeatures
object. This will create a graph where each node is an
assay and links between assays are denoted as edges.
plot(scp)