This vignette explains the main concepts for data-processing and statistical analysis of
(DIA) proteomics data using msqrob2. To illustrate these concepts, we will
use using a publicly available spike-in study published by
Staes et al. (Staes et al. 2024). They spiked digested UPS proteins in a yeast digested
background at the following ratio’s
(yeast:ups ratio 10:1, 10:2, 10:4, 10:8, 10:10).
Here we will use a subset of the data, i.e. dilutions 10:2, 10:4 and 10:8.
We will use output of the search engine Spectronaut.
The main search output for this Spectronaut version was stored in a
report.parquet file.
DIA-NN provides multiple quantifications, e.g. derived from the MS1 or MS2 spectra, and at precursor or protein (protein group) level. The term ‘precursor’ refers to a charged peptide species and is the basic unit of identification and quantification in DIA. Hence, in the context of DIA we refer to a precursor table, instead of to a PSM table in DDA.
Examples of different quantities are:
Ms1.Area, normalised MS1 Area: Ms1.Normalised, MS2 Precursor quantities: Precursor.Quantity, Normalised MS2 Precursor quantities: Precursor.Normalised, etc., which are all at the precursor levelPG.MaxLFQIn this vignette, we will use the Precursor.Quantity column so as to
illustrate all steps.
If you prefer to start from the Precursor.Normalised intensities, you can skip the
normalisation step as the normalisation already has been done by DIA-NN
internally.
We load the msqrob2 package, along with additional packages for
data manipulation and visualisation.
suppressPackageStartupMessages({
library("QFeatures")
library("dplyr")
library("tidyr")
library("ggplot2")
library("msqrob2")
library("stringr")
library("ExploreModelMatrix")
library("kableExtra")
library("ComplexHeatmap")
library("scater")
})
We load the output from DIA-NN parquet file. Can be file path to local file or url to file that lives on the web.
precursorFile <- "https://github.com/statOmics/msqrob2data/raw/refs/heads/main/dia/spikeinStaes/spikein248-staesetal2024.parquet"
We can import the report.parquet file using the read_parquet function from
the arrow package.
Note, that older versions of DIA-NN store the output as report.tsv.
which we import using the fread function.
Both read_parquet and fread can import data directly from the web as well as
data from local files on your computer.
precursors <- arrow::read_parquet(precursorFile) # function from the arrow package
## For older versions of Spectronaut, where the results are stored as tsv files.
## Note that the precursorFile then would point to "report.tsv"
## precursors <- data.table::fread(precursorFile)
knitr::kable(head(precursors))
| Run.Index | Run | Channel | Precursor.Id | Modified.Sequence | Stripped.Sequence | Precursor.Charge | Precursor.Lib.Index | Decoy | Proteotypic | Precursor.Mz | Protein.Ids | Protein.Group | Protein.Names | Genes | RT | iRT | Predicted.RT | Predicted.iRT | IM | iIM | Predicted.IM | Predicted.iIM | Precursor.Quantity | Precursor.Normalised | Ms1.Area | Ms1.Normalised | Ms1.Apex.Area | Ms1.Apex.Mz.Delta | Normalisation.Factor | Quantity.Quality | Empirical.Quality | Normalisation.Noise | Ms1.Profile.Corr | Evidence | Mass.Evidence | Channel.Evidence | Ms1.Total.Signal.Before | Ms1.Total.Signal.After | RT.Start | RT.Stop | FWHM | PG.TopN | PG.MaxLFQ | Genes.TopN | Genes.MaxLFQ | Genes.MaxLFQ.Unique | PG.MaxLFQ.Quality | Genes.MaxLFQ.Quality | Genes.MaxLFQ.Unique.Quality | Q.Value | PEP | Global.Q.Value | Lib.Q.Value | Peptidoform.Q.Value | Global.Peptidoform.Q.Value | Lib.Peptidoform.Q.Value | PTM.Site.Confidence | Site.Occupancy.Probabilities | Protein.Sites | Lib.PTM.Site.Confidence | Translated.Q.Value | Channel.Q.Value | PG.Q.Value | PG.PEP | GG.Q.Value | Protein.Q.Value | Global.PG.Q.Value | Lib.PG.Q.Value | Best.Fr.Mz | Best.Fr.Mz.Delta |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | B000254_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio02_DIA | (UniMod:1)AAVTLHLR2 | (UniMod:1)AAVTLHLR | AAVTLHLR | 2 | 0 | 0 | 1 | 461.7771 | P38998 | P38998 | LYS1_YEAST | LYS1 | 89.48647 | 34.21962 | 89.64584 | 34.32372 | 0 | 0 | 0 | 0 | 1027105.2 | 1018664.5 | 1664103 | 1650427 | 1235652.9 | -0.0019836 | 0.9917820 | 0.8068524 | 0.9587077 | -0.0107658 | 0.8455485 | 2.328422 | 0.6659890 | 0.9694718 | 1446081536 | 1354918656 | 89.27837 | 89.83672 | 0.1471047 | 0 | 60736892 | 0 | 60736892 | 60736892 | 0.9713600 | 0.9713600 | 0.9713600 | 0.0037350 | 0.0346908 | 6e-07 | 1.8e-06 | 0.0058624 | 0.0006297 | 0.0004264 | 1 | (UniMod:1){1.000000}AAVTLHLR2 | [P38998:nA2] | 1 | 0 | 0 | 0.0003065 | 0.0006127 | 0.0003067 | 0.0003116 | 0.0002956 | 0.0002814 | 639.3937 | -0.0043335 | |
| 1 | B000258_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio04_DIA | (UniMod:1)AAVTLHLR2 | (UniMod:1)AAVTLHLR | AAVTLHLR | 2 | 0 | 0 | 1 | 461.7771 | P38998 | P38998 | LYS1_YEAST | LYS1 | 88.51812 | 34.21962 | 88.65327 | 34.23263 | 0 | 0 | 0 | 0 | 1686047.0 | 1473639.0 | 2225964 | 1945538 | 2138060.2 | -0.0012512 | 0.8740202 | 0.8539391 | 0.7675559 | 0.0606278 | 0.9467068 | 2.621551 | 0.9197075 | 0.7442634 | 1707504896 | 1714693888 | 88.38112 | 88.65714 | 0.1049480 | 0 | 59386180 | 0 | 59386180 | 59386180 | 0.9682981 | 0.9682981 | 0.9682981 | 0.0000006 | 0.0000010 | 6e-07 | 1.8e-06 | 0.0007060 | 0.0006297 | 0.0004264 | 1 | (UniMod:1){1.000000}AAVTLHLR2 | [P38998:nA2] | 1 | 0 | 0 | 0.0003028 | 0.0006007 | 0.0003029 | 0.0003108 | 0.0002956 | 0.0002814 | 538.3460 | -0.0010986 | |
| 2 | B000262_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio08_DIA | (UniMod:1)AAVTLHLR2 | (UniMod:1)AAVTLHLR | AAVTLHLR | 2 | 0 | 0 | 1 | 461.7771 | P38998 | P38998 | LYS1_YEAST | LYS1 | 88.19032 | 34.21962 | 88.42429 | 33.99163 | 0 | 0 | 0 | 0 | 1484137.0 | 1497021.4 | 2156443 | 2175164 | 1850866.0 | -0.0018616 | 1.0086815 | 0.8370125 | 0.9024460 | 0.0352767 | 0.9391663 | 1.963248 | 0.6607722 | 0.5847857 | 1906061952 | 2011915136 | 87.98146 | 88.39668 | 0.1602550 | 0 | 61945576 | 0 | 61945576 | 61945576 | 0.9670781 | 0.9670781 | 0.9670781 | 0.0022817 | 0.0274225 | 6e-07 | 1.8e-06 | 0.0059578 | 0.0006297 | 0.0004264 | 1 | (UniMod:1){1.000000}AAVTLHLR2 | [P38998:nA2] | 1 | 0 | 0 | 0.0003133 | 0.0006259 | 0.0003135 | 0.0003190 | 0.0002956 | 0.0002814 | 639.3937 | 0.0012817 | |
| 3 | B000278_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio02_DIA_2 | (UniMod:1)AAVTLHLR2 | (UniMod:1)AAVTLHLR | AAVTLHLR | 2 | 0 | 0 | 1 | 461.7771 | P38998 | P38998 | LYS1_YEAST | LYS1 | 88.94958 | 34.21962 | 89.25366 | 33.75805 | 0 | 0 | 0 | 0 | 1202390.5 | 1186230.2 | 1752379 | 1728827 | 1130526.8 | -0.0038757 | 0.9865599 | 0.7660662 | 0.7737662 | 0.0300994 | 0.5843989 | 2.661267 | 0.6917942 | 0.8114025 | 2323055360 | 2137220352 | 88.61057 | 89.15668 | 0.2167290 | 0 | 61787356 | 0 | 61787356 | 61787356 | 0.9639115 | 0.9639115 | 0.9639115 | 0.0026284 | 0.0164647 | 6e-07 | 1.8e-06 | 0.0061239 | 0.0006297 | 0.0004264 | 1 | (UniMod:1){1.000000}AAVTLHLR2 | [P38998:nA2] | 1 | 0 | 0 | 0.0003246 | 0.0006370 | 0.0003248 | 0.0003308 | 0.0002956 | 0.0002814 | 425.2619 | 0.0019226 | |
| 5 | B000286_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio08_DIA_2 | (UniMod:1)AAVTLHLR2 | (UniMod:1)AAVTLHLR | AAVTLHLR | 2 | 0 | 0 | 1 | 461.7771 | P38998 | P38998 | LYS1_YEAST | LYS1 | 88.49445 | 34.21962 | 88.61003 | 34.13313 | 0 | 0 | 0 | 0 | 475073.6 | 579242.2 | 1481789 | 1806699 | 777120.9 | -0.0014648 | 1.2192687 | 0.6968042 | 0.7491062 | 0.1018161 | 0.0000000 | 1.700542 | 0.3975840 | 0.1695192 | 1270731776 | 1315600640 | 88.28562 | 88.63584 | 0.1518549 | 0 | 63859400 | 0 | 63859400 | 63859400 | 0.9677207 | 0.9677207 | 0.9677207 | 0.0040565 | 0.0318976 | 6e-07 | 1.8e-06 | 0.0077074 | 0.0006297 | 0.0004264 | 1 | (UniMod:1){1.000000}AAVTLHLR2 | [P38998:nA2] | 1 | 0 | 0 | 0.0003152 | 0.0006296 | 0.0003154 | 0.0003210 | 0.0002956 | 0.0002814 | 639.3937 | 0.0092773 | |
| 6 | B000302_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio02_DIA_3 | (UniMod:1)AAVTLHLR2 | (UniMod:1)AAVTLHLR | AAVTLHLR | 2 | 0 | 0 | 1 | 461.7771 | P38998 | P38998 | LYS1_YEAST | LYS1 | 88.94778 | 34.21962 | 89.22704 | 33.83727 | 0 | 0 | 0 | 0 | 1211047.2 | 1278841.4 | 1902638 | 2009148 | 1505334.4 | -0.0023804 | 1.0559797 | 0.8122462 | 0.9176636 | -0.0058500 | 0.8568795 | 2.149787 | 0.2013753 | 0.9174258 | 1576011776 | 1611237888 | 88.80907 | 89.22405 | 0.1854964 | 0 | 57830312 | 0 | 57830312 | 57830312 | 0.9690518 | 0.9690518 | 0.9690518 | 0.0038881 | 0.0323859 | 6e-07 | 1.8e-06 | 0.0095169 | 0.0006297 | 0.0004264 | 1 | (UniMod:1){1.000000}AAVTLHLR2 | [P38998:nA2] | 1 | 0 | 0 | 0.0003108 | 0.0006149 | 0.0003110 | 0.0003160 | 0.0002956 | 0.0002814 | 639.3937 | -0.0004883 |
Each row in the precursor data table is in “long format” and contains information about one precursor in a specific run (the table below shows the first 6 rows). The columns contains various descriptors about the precursor, such as its sequence, its charge, run, etc.
We select variables to reduce the memory footprint.
precursors <- precursors |>
select(
Run,
Precursor.Id,
Modified.Sequence,
Stripped.Sequence,
Precursor.Charge,
Protein.Group,
Protein.Names,
Protein.Ids,
Genes,
Precursor.Quantity,
Precursor.Normalised,
Normalisation.Factor,
Ms1.Area,
Ms1.Normalised,
PG.MaxLFQ,
Q.Value,
Lib.Q.Value,
PG.Q.Value,
Lib.PG.Q.Value,
Proteotypic,
Decoy, # Not available in older versions of DIA-NN
RT)
Unlike for real experiments, we known the ground truth for this dataset: UPS proteins were spiked in at different concentrations and are thus differentially abundant (DA, spiked in). Yeast proteins consist of the background proteome that is the same in all samples and are thus non-DA.
precursors <- precursors |>
mutate(species = grepl(pattern = "UPS",Protein.Group) |>
as.factor() |>
recode("TRUE"="ups","FALSE" = "yeast"))
precursors |>
pull(species) |>
table()
##
## yeast ups
## 301195 4481
Quick check on distribution of precursors MS2 intensities.
precursors |>
ggplot(aes(x = log2(Precursor.Quantity))) +
geom_density() +
theme_minimal()
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_density()`).
The sample annotation table) is not available and can be generated from the run labels, as the researchers included information on the design in the filenames.
We will make a new data frame with the annotation.
We first generate variable runCol, that gives an overview of the different runs in the experiment, i.e. the unique names in the column Run of the report file. This is a mandatory column in the annotation file that is required by the readQFeatures function that will be used to generate a QFeatures object with the quant data.
Next we generate variable sampleId, to pinpoint the different samples in the dataset. This can be extracted from the run names as the researchers have stored information on the meta data in the file names for each run. E.g. for run B000282_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio04_DIA_2 this is the ratio, i.e. ratio04, and the replicate, i.e. _2 after DIA. Note, that the first replicate has no number.
strsplit function andstrplit with an sapply loop, which takes the output of b. as its first argument, uses the function ‘[’ to subset vector of strings in each list element and uses an optional argument ‘2’ to select the second string of the vector (right part).We add the variable ratio by parsing the variable stringId and
gsub functiongsub according to pattern "_" andWe add the variable rep by parsing the variable stringId, and
(annot <- data.frame(runCol = precursors |>
pull(Run) |>
unique() # 1.
) |>
mutate(sampleId = gsub(x = runCol, pattern = "_DIA", replacement = "") |> #2.a
str_split("UPS2_") |> #2.b
sapply(`[`, 2) #2.c
) |>
mutate(
condition = gsub("ratio", "", sampleId) |> #3.a
str_split("_") |> #3.b
sapply(`[`, 1) |> #3.c
as.numeric() |> #3.d
as.factor(), #3.e
rep = sampleId |>
str_split("_") |> #4.a
sapply(`[`, 2) |> #4.b
replace_na(replace = "1") |> #4.c
as.factor(), #4.d
ratio = condition |>
as.character() |>
as.double()
)
)
## runCol sampleId condition rep
## 1 B000254_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio02_DIA ratio02 2 1
## 2 B000258_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio04_DIA ratio04 4 1
## 3 B000262_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio08_DIA ratio08 8 1
## 4 B000278_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio02_DIA_2 ratio02_2 2 2
## 5 B000286_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio08_DIA_2 ratio08_2 8 2
## 6 B000302_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio02_DIA_3 ratio02_3 2 3
## 7 B000306_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio04_DIA_3 ratio04_3 4 3
## 8 B000310_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio08_DIA_3 ratio08_3 8 3
## 9 B000282_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio04_DIA_2 ratio04_2 4 2
## ratio
## 1 2
## 2 4
## 3 8
## 4 2
## 5 8
## 6 2
## 7 4
## 8 8
## 9 4
First, recall that the precursor table is file in long format.
Every quantitative column in the precursor table contains
information for multiple runs. Therefore, the function split the table
based on the run identifier, given by the runCol argument (for
DIA-NN, that identifier is contained in Run).
So, the
QFeatures object after import will contain as many sets as there are
runs.
Next, the function links the annotation table with the PSM data.
To achieve this, the annotation table must contain a runCol column
that provides the run identifier in which each sample has been
acquired, and this information will be used to match the identifiers
in the Run column of the precursor table.
Here, we will use the Precursor.Quantity column as quantification input.
(qf <- readQFeatures(assayData = precursors,
colData = annot,
quantCols = "Precursor.Quantity",
runCol = "Run",
fnames = "Precursor.Id"))
## Checking arguments.
## Loading data as a 'SummarizedExperiment' object.
## Splitting data in runs.
##
|
| | 0%
|
|======== | 11%
|
|================ | 22%
|
|======================= | 33%
|
|=============================== | 44%
|
|======================================= | 56%
|
|=============================================== | 67%
|
|====================================================== | 78%
|
|============================================================== | 89%
|
|======================================================================| 100%
## Formatting sample annotations (colData).
## Formatting data as a 'QFeatures' object.
## Setting assay rownames.
## An instance of class QFeatures (type: bulk) with 9 sets:
##
## [1] B000254_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio02_DIA: SummarizedExperiment with 34471 rows and 1 columns
## [2] B000258_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio04_DIA: SummarizedExperiment with 35142 rows and 1 columns
## [3] B000262_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio08_DIA: SummarizedExperiment with 34028 rows and 1 columns
## ...
## [7] B000302_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio02_DIA_3: SummarizedExperiment with 33091 rows and 1 columns
## [8] B000306_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio04_DIA_3: SummarizedExperiment with 34472 rows and 1 columns
## [9] B000310_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio08_DIA_3: SummarizedExperiment with 33724 rows and 1 columns
The data preprocessing workflow for DIA data is similar to the workflow for DDA-LFQ data, but there are suble differences as we start from precursor level data and have additional columns.
We first replace any zero in the quantitative data
with an NA.
qf <- zeroIsNA(qf, names(qf))
Note that msqrob2 can handle missing data without having to rely on
hard-to-verify imputation assumptions, which is our general recommendation.
However, msqrob2 does not prevent users from using imputation, which can be
performed by starting from Spectronaut output with imputed values or by imputing
the data themselves, e.g. by using impute() from the QFeatures package.
Filtering removes low-quality and unreliable precursors that would otherwise introduce noise and artefacts in the data.
We apply standard filtering advised by DIA-NN:
q-value threshold of 0.01 for the identification of precursors (Q.Value) and protein groups (PG.Q.Value). If the file is processed via matching between runs, it is also useful to filter on Lib.Q.Value and Lib.PG.Q.Value.
Remove precursors that could not be mapped, i.e. when Precursor.Id column is an empty string.
Filter decoys, i.e. only keep precursors for which the Decoy column equals 0. (Note, that the Decoy column is not present in the output of older versions of DIA-NN)
Keeping only proteotypic peptides, which map uniquely to a specific protein.
qf <- qf |>
filterFeatures(~ Q.Value <= 0.01 & #1.
PG.Q.Value <= 0.01 & #1.
Lib.Q.Value <= 0.01 & #1.
Lib.PG.Q.Value <= 0.01 & #1.
Precursor.Id != "" & #2.
Decoy == 0 & #3. (Note, that this filter is not available with previous versions of DIA-NN, as the report.tsv file did not include a Decoy column. So other strategies are needed if Decoys are in the output file.)
Proteotypic == 1) #4.
## 'Q.Value' found in 9 out of 9 assay(s).'PG.Q.Value' found in 9 out of 9 assay(s).'Lib.Q.Value' found in 9 out of 9 assay(s).'Lib.PG.Q.Value' found in 9 out of 9 assay(s).'Precursor.Id' found in 9 out of 9 assay(s).'Decoy' found in 9 out of 9 assay(s).'Proteotypic' found in 9 out of 9 assay(s).
Note, that it is important that the filtering criteria are not distorting the distribution of the test statistics in the downstream analysis for features that are non-DA. It can be shown that filtering will not induce bias results when the filtering criterion is independent of test statistic. The criteria that we proposed above are all based on the results of the identification step, hence, they are independent of the downstream test statistics that will be used to prioritize DA proteins.
Up to now, the data from different runs were kept in separate assays. We can now join the filtered sets into an precursor set using joinAssays(). Sets are joined by stacking the columns (samples) in a matrix and rows (features) are matched according to a row identifier, here the Precursor.Id.
We will store the result in the assay with name: precursor.
(qf <- joinAssays(
x = qf,
i = names(qf),
fcol = "Precursor.Id",
name = "precursors"
))
## Using 'Precursor.Id' to join assays.
## An instance of class QFeatures (type: bulk) with 10 sets:
##
## [1] B000254_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio02_DIA: SummarizedExperiment with 32848 rows and 1 columns
## [2] B000258_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio04_DIA: SummarizedExperiment with 33491 rows and 1 columns
## [3] B000262_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio08_DIA: SummarizedExperiment with 32406 rows and 1 columns
## ...
## [8] B000306_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio04_DIA_3: SummarizedExperiment with 32829 rows and 1 columns
## [9] B000310_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio08_DIA_3: SummarizedExperiment with 32128 rows and 1 columns
## [10] precursors: SummarizedExperiment with 38615 rows and 9 columns
We keep peptides that were observed at last 4 times out of the \(n = 9\) samples, so that we can estimate the peptide characteristics. We tolerate the following proportion of NAs: \(\text{pNA} = \frac{(n - 4)}{n} = 0.556\), so we keep peptides that are observed in at least 44.4% of the samples, which corresponds to one treatment condition. This is an arbitrary value that may need to be adjusted depending on the experiment and the data set.
nObs <- 4
n <- ncol(qf[["precursors"]])
(qf <- filterNA(qf, i = "precursors", pNA = (n - nObs) / n))
## An instance of class QFeatures (type: bulk) with 10 sets:
##
## [1] B000254_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio02_DIA: SummarizedExperiment with 32848 rows and 1 columns
## [2] B000258_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio04_DIA: SummarizedExperiment with 33491 rows and 1 columns
## [3] B000262_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio08_DIA: SummarizedExperiment with 32406 rows and 1 columns
## ...
## [8] B000306_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio04_DIA_3: SummarizedExperiment with 32829 rows and 1 columns
## [9] B000310_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio08_DIA_3: SummarizedExperiment with 32128 rows and 1 columns
## [10] precursors: SummarizedExperiment with 34796 rows and 9 columns
Here, we remove proteins that can only be found by one peptide, as such proteins may not be trustworthy.
We first calculate how many distinct peptides map to each protein (Protein.Group). We use the stripped precursor sequence, i.e. sequence of the base peptide for this purpose.
We store this information in the row data of the precursors assay
We filter precursors of one-hit wonder proteins.
# Filter for peptides per protein
pepsPerProtDf <- qf[["precursors"]] |>
rowData() |>
data.frame() |>
dplyr::select("Stripped.Sequence", "Protein.Group") |>
group_by(Protein.Group) |>
mutate(pepsPerProt = Stripped.Sequence |>
unique() |> length()
) #1.
rowData(qf[["precursors"]])$pepsPerProt <- pepsPerProtDf$pepsPerProt #2.
qf <- filterFeatures(qf,
~ pepsPerProt > 1,
keep = TRUE) #3.
## 'pepsPerProt' found in 1 out of 10 assay(s).
We perform log2-transformation with logTransform() from the
QFeatures package. We use base = 2 and store the result in a new summarized
experiment named precursors_log
qf <- logTransform(qf,
base = 2,
i = "precursors",
name = "precursors_log")
The most common objective of MS-based proteomics experiments is to understand the biological changes in protein abundance between experimental conditions. However, changes in measurements between groups can be caused due to technical factors. For instance, there are systematic fluctuations from run-to-run that shift the measured intensity distribution. We can this explore as follows:
QFeatures’ 3-way subsetting.longForm() to convert the QFeatures object into a long
table, including condition and concentration for filtering and
colouring.qf[, , "precursors_log"] |> #1.
longForm(colvars = c( "rep", "condition")) |> #2.
data.frame() |>
filter(!is.na(value)) |>
ggplot() + #3.
aes(x = value,
colour = condition,
group = colname) +
geom_density() +
theme_minimal()
## Warning: 'experiments' dropped; see 'drops()'
## harmonizing input:
## removing 18 sampleMap rows not in names(experiments)
Even in this clean synthetic data set (same yeast background with some spiked UPS proteins), the marginal precursor intensity distributions across samples are not well aligned. Ignoring this effect will increase the noise and reduce the statistical power of the experiment, and may also, in case of unbalanced designs, introduce confounding effects that will bias the results.
There are many ways to perform the normalisation, e.g.
median centering is a popular choice.
If we subtract the sample median at the log scale from each precursor
log2 intensity, this basically boils down to calculating log-ratio’s between
the precursor intensity and its sample median.
Here, we choose to work with offsets, that are chosen in such a way so that
the distributions are centered around the location of a reference sample.
E.g. the median of the sample medians.
Note that users can also use all normalisation functions that are provided in
QFeatures, i.e. by replacing the sweep function by QFeatures::normalize.
qf <- sweep( #4. Subtract log2 norm factor column-by-column (MARGIN = 2)
qf,
MARGIN = 2,
STATS = nfLogMedian(qf,"precursors_log"),
i = "precursors_log",
name = "precursors_norm"
)
## This function aims to calculate norm factors on a log scale,
## the input data are assumed to be on the log-scale!
We explore the effect of the global normalisation in the subsequent plot.
Formally, the function applies the following operation on each sample \(i\) across all precursors \(p\):
\[ y_{ip}^{\text{norm}} = y_{ip} - \log_2(nf_i) \]
with \(y_{ip}\) the log2-transformed intensities and \(\log_2(nf_i)\) the log2-transformed norm factor. Upon normalisation, we can see that the distribution of the \(y_{ip}^{\text{norm}}\) nicely overlap (using the same code as above)
qf[, , "precursors_norm"] |>
longForm(colvars = colnames(colData(qf))) |>
data.frame() |>
filter(!is.na(value)) |>
ggplot() +
aes(x = value,
colour = condition,
group = colname) +
geom_density() +
labs(subtitle = "Normalised log2 precursor intensities") +
theme_minimal()
## Warning: 'experiments' dropped; see 'drops()'
## harmonizing input:
## removing 27 sampleMap rows not in names(experiments)
Here, we summarise the precursor-level data into protein intensities using
maxLFQ.
maxLFQ first calculates all pairwise log ratio’s between samples only using
their shared precursors.
Particularly, it uses the median of the log ratio’s between the shared
precursors, i.e. \[r_{ij} = median(y_{sj}-y_{si})\].
Hence, it first eliminates peptide effects.
It then estimates the summaries by solving
\[
\sum_i\sum_j(y^{prot}_j - y^{prot}_i - r_ij)^2
\]
It is implemented in the maxLFQ function of the iq package. A maxLFQ like
summarisation is used by default in DIA-NN.
aggregateFeatures() streamlines summarisation. It requires the name
of a rowData column to group the precursors into proteins (or
protein groups), here Protein.Group. We provide the summarisation
approach through the fun argument. Other summarisation methods
are available from the MsCoreUtils package, see ?aggregateFeatures
for a comprehensive list. The function will return a QFeatures
object with a new set that we called proteins.
Note, that maxLFQ can take a long time for large datasets. In that case
we suggest to replace function(X) iq::maxLFQ(X)$estimate
by MsCoreUtils::medianPolish, na.rm = TRUE.
Note that the optional argument na.rm = TRUE is needed when using medianPolish
because the data contains missing values as we did not adopt imputation.
(qf <- aggregateFeatures(
qf, i = "precursors_norm",
name = "proteins",
fcol = "Protein.Group",
fun = function(X) iq::maxLFQ(X)$estimate
#fun = MsCoreUtils::medianPolish, na.rm = TRUE
))
##
|
| | 0%
|
|======================================================================| 100%
## The following messages occurred during aggregation:
##
## Your quantitative data contain missing values. Please read the relevant
## section(s) in the aggregateFeatures manual page regarding the effects
## of missing values on data aggregation.
##
## Occurred during the aggregation of set(s): B000254_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio02_DIA
## An instance of class QFeatures (type: bulk) with 13 sets:
##
## [1] B000254_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio02_DIA: SummarizedExperiment with 32848 rows and 1 columns
## [2] B000258_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio04_DIA: SummarizedExperiment with 33491 rows and 1 columns
## [3] B000262_Ap_6883_EXT-765_DIA_Yeast_UPS2_ratio08_DIA: SummarizedExperiment with 32406 rows and 1 columns
## ...
## [11] precursors_log: SummarizedExperiment with 34421 rows and 9 columns
## [12] precursors_norm: SummarizedExperiment with 34421 rows and 9 columns
## [13] proteins: SummarizedExperiment with 3121 rows and 9 columns
Data exploration aims to highlight the main sources of variation in the data prior to data modelling and can pinpoint to outlying or off-behaving samples.
qf[, , "precursors_norm"] |>
longForm(colvars = colnames(colData(qf))) |>
data.frame() |>
filter(!is.na(value)) |>
ggplot() +
aes(x = value,
colour = condition,
group = colname) +
geom_density() +
theme_minimal() +
labs(subtitle = "Normalised log2 precursor intensities")
## Warning: 'experiments' dropped; see 'drops()'
## harmonizing input:
## removing 36 sampleMap rows not in names(experiments)
qf[, , "precursors_norm"] |>
longForm(colvars = colnames(colData(qf))) |>
data.frame() |>
filter(!is.na(value)) |>
ggplot() +
aes(x = sampleId,
y = value,
colour = condition,
group = colname) +
xlab("sample") +
geom_boxplot() +
theme_minimal() +
labs(subtitle = "Normalised log2 precursor intensities")
## Warning: 'experiments' dropped; see 'drops()'
## harmonizing input:
## removing 36 sampleMap rows not in names(experiments)
qf[, , "proteins"] |>
longForm(colvars = colnames(colData(qf))) |>
data.frame() |>
filter(!is.na(value)) |>
ggplot() +
aes(x = value,
colour = condition,
group = colname) +
geom_density() +
theme_minimal() +
labs(subtitle = "Normalised log2 protein intensities")
## Warning: 'experiments' dropped; see 'drops()'
## harmonizing input:
## removing 36 sampleMap rows not in names(experiments)
qf[, , "proteins"] |>
longForm(colvars = colnames(colData(qf))) |>
data.frame() |>
filter(!is.na(value)) |>
ggplot() +
aes(x = sampleId,
y = value,
colour = condition,
group = colname) +
xlab("sample") +
geom_boxplot() +
theme_minimal() +
labs(subtitle = "Normalised log2 protein intensities")
## Warning: 'experiments' dropped; see 'drops()'
## harmonizing input:
## removing 36 sampleMap rows not in names(experiments)
qf[, , "precursors_norm"] |>
longForm(colvars = colnames(colData(qf)), rowvars = "Precursor.Charge") |>
as.data.frame() |>
filter(!is.na(value)) |>
filter(Precursor.Charge<=4) |>
ggplot(aes(x = sampleId)) +
geom_bar(aes(fill = factor(Precursor.Charge, levels = 4:1)),
colour = "black") +
labs(title = "Peptide types per sample",
x = "Sample",
fill = "Charge state") +
theme_bw()
## Warning: 'experiments' dropped; see 'drops()'
## harmonizing input:
## removing 36 sampleMap rows not in names(experiments)
qf[,,"precursors_norm"] |>
longForm(colvars = colnames(colData(qf)),
rowvars= c("Precursor.Id",
"Protein.Group",
"Stripped.Sequence",
"Precursor.Charge")) |>
data.frame() |>
filter(!is.na(value)) |>
mutate(cond_rep = paste0("ratio", condition, "_", rep)) |>
group_by(condition, cond_rep) |>
summarise(Precursors = length(unique(Precursor.Id)),
`Protein Groups` = length(unique(Protein.Group))) |>
pivot_longer(-(1:2),
names_to = "Feature",
values_to = "IDs") |>
ggplot(aes(x = cond_rep, y = IDs, fill = condition)) +
geom_col() +
#scale_fill_observable() +
facet_wrap(~Feature,
scales = "free_y") +
labs(title = "Precursor and protein group identificiations per sample",
x = "Sample",
y = "Identifications") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90))
## Warning: 'experiments' dropped; see 'drops()'
## harmonizing input:
## removing 36 sampleMap rows not in names(experiments)
## `summarise()` has regrouped the output.
## ℹ Summaries were computed grouped by condition and cond_rep.
## ℹ Output is grouped by condition.
## ℹ Use `summarise(.groups = "drop_last")` to silence this message.
## ℹ Use `summarise(.by = c(condition, cond_rep))` for per-operation grouping
## (`?dplyr::dplyr_by`) instead.
A common approach for data exploration is to perform dimension reduction, such as Multi Dimensional Scaling (MDS). We will first extract the set to explore along the sample annotations (used for plot colouring).
getWithColData(qf, "proteins") |>
as("SingleCellExperiment") |>
runMDS(exprs_values = 1) |>
plotMDS(colour_by = "condition", shape_by = "rep", point_size = 3)
## Warning: 'experiments' dropped; see 'drops()'
This plot reveals interesting information. First, we see that the samples are nicely separated according to their spike-in condition. Interestingly, the conditions are sorted by the concentration.
corMat <- qf[["proteins"]] |>
assay() |>
cor(method = "spearman", use = "pairwise.complete.obs")
colnames(corMat) <- qf$sampleId
rownames(corMat) <- qf$sampleId
corMat |>
ggcorrplot::ggcorrplot() +
scale_fill_viridis_c() +
labs(title = "Correlation matrix",
fill = "Correlation") +
theme(axis.text.x = element_text(angle = 90))
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
For this experiment, we can only model a single source of variation: spike-in concentration. All remaining sources of variability, e.g. pipetting errors, MS run to run variability, etc will be lumped in the residual variability (error term).
Spike-in condition effects: we model the source of variation induced by the experimental treatment of interest as a fixed effect, which we consider non-random, i.e. the treatment effect is assumed to be the same in repeated experiments, but it is unknown and has to be estimated.
When modelling a typical label-free experiment at the protein level, the model boils down to the following linear model:
\[ y_i = \mathbf{x}_i^T\boldsymbol{\beta} +\epsilon_i\]
with the \(\log_2\)-normalised and summarised protein intensities in sample;
\(\mathbf{x}_i\) a vector with the covariate pattern for the sample encoding the
intercept, spike-in condition, or other experimental factors;
\(\boldsymbol{\beta}\) the vector of parameters that model the association between
the covariates and the outcome; and \(\epsilon_i\) the residuals reflecting
variation that is not captured by the fixed effects.
Note that allows for a flexible parameterisation of the treatment
beyond a single covariate, i.e. including a 1 for the intercept, continuous and
categorical variables as well as their interactions. We assume the residuals to
be independent and identically distributed (i.i.d) according to a normal
distribution with zero mean and constant variance,
i.e. \(\epsilon_i \sim N(0, \sigma^2_\epsilon)\), that can differ from protein to
protein.
In R, this model is encoded using the following simple formula:
model <- ~ condition
We estimate the model with msqrob(). The function takes the QFeatures object,
extracts the quantitative values from the "proteins" set generated during
summarisation, and fits a simple linear model with condition as covariate,
which are automatically retrieved from colData(qf).
Note, that msqrob() can also take a SummarizedExperiment of
precursor, peptide or protein intensities as its input.
qf <- msqrob(
qf,
i = "proteins",
formula = model,
robust = TRUE)
We enable M-estimation (robust = TRUE) for improved robustness against outliers.
The fitting results are available in the msqrobModels column of the rowData. More specifically, the modelling output is stored in the rowData as a statModel object, one model per row (protein). We will see in a later section how to perform statistical inference on the estimated parameters.
models <- rowData(qf[["proteins"]])[["msqrobModels"]]
models[1:3]
## $D6VTK4
## Object of class "StatModel"
## The type is "rlm"
## There number of elements is 5
##
## $O13297
## Object of class "StatModel"
## The type is "rlm"
## There number of elements is 5
##
## $O13329
## Object of class "StatModel"
## The type is "rlm"
## There number of elements is 5
We can now convert the research question “do the spike-in conditions affect the protein intensities?” into a statistical hypothesis.
In other words, we need to translate this question in a null and alternative hypothesis on a single model parameter or on a linear combination of model parameters, which is also referred to with a contrast.
To aid defining contrasts, we will visualise the experimental design using the ExploreModelMatrix package.
vd <- ExploreModelMatrix::VisualizeDesign(
sampleData = colData(qf),
designFormula = model,
textSizeFitted = 4
)
vd$plotlist
## [[1]]
This plot shows that the average protein \(\log_2\) intensity for condition2 is modelled by (Intercept), for the condition4 group by (Intercept) + condtion4 and for the condition8 group by (Intercept) + condtion8.
Hence, to assess differential abundance of proteins between spike-in conditions, we will have to assess all pairwise combinations between the spike-in groups, which leads to 3 different contrasts, i.e. the log2 fold change between ratio 4 and ratio 2, \(\log_2 FC_{4-2} = (Intercept + condition4) - Intercept = condition4\), the log2 fold change between ratio 8 and ratio 2, \(\log_2 FC_{8-2} = (Intercept + condition8) - Intercept = condition8\), and the log2 fold change between ratio 8 and ratio 4, \(\log_2 FC_{8-4} = (Intercept + condition8) - (Intercept+condition4) = condition8 - condition4\).
We can generate the null-hypothesess for all pairwise comparisons manually.
However, here we will use the createPairwiseContrasts function in msqrob2.
(allHypotheses <- createPairwiseContrasts(
model, colData(qf), "condition"
)
)
## Warning: the 'nobars' function has moved to the reformulas package. Please update your imports, or ask an upstream package maintainer to do so.
## This warning is displayed once per session.
## [1] "condition4 = 0" "condition8 = 0"
## [3] "condition8 - condition4 = 0"
Based on these null hypothesis we now generate the contrast matrix with all
three contrasts using the makeContrast function. Note, that experienced users
can also define the contrast matrix, directly.
(L <- makeContrast(
allHypotheses,
parameterNames = colnames(vd$designmatrix)
))
## condition4 condition8 condition8 - condition4
## (Intercept) 0 0 0
## condition4 1 0 -1
## condition8 0 1 1
We can now test our null hypothesis using hypothesisTest() which
takes the QFeatures object with the fitted model and the contrast we
just built. msqrob2 automatically applies the hypothesis
testing to all features in the assay.
qf <- hypothesisTest(qf, i = "proteins", contrast = L)
The results tables for each contrast are stored in the row data of the
proteins assay in qf under the colomnames of the contrast matrix L.
qf[["proteins"]] |>
rowData() |>
names()
## [1] "Protein.Group" "Protein.Names"
## [3] "Protein.Ids" "Genes"
## [5] "Lib.PG.Q.Value" "Proteotypic"
## [7] "Decoy" "species"
## [9] "pepsPerProt" ".n"
## [11] "msqrobModels" "condition4"
## [13] "condition8" "condition8 - condition4"
Users can directly access the results tables via de colData. However, msqrob2
also provides the msqrobCollect function. This function will automatically
retrieve the results tables for all contrasts and combine them by default in one
table. This table contains two additional columns: contrast with the name of
the contrast and feature with the nme of the modelled feature, here protein
group.
inferences <-
msqrobCollect(qf[["proteins"]], L)
head(inferences)
## logFC se df t pval
## condition4.D6VTK4 -0.27045979 0.13185841 7.560838 -2.0511380 0.07641608
## condition4.O13297 0.05464939 0.07948963 7.620863 0.6875034 0.51214122
## condition4.O13329 0.18325945 0.15888252 4.634229 1.1534274 0.30473223
## condition4.O13539 0.02317347 0.18032985 8.505706 0.1285060 0.90073544
## condition4.O13563 0.15757269 0.13513144 8.581991 1.1660698 0.27496810
## condition4.O13585 -0.09970767 0.14552880 7.289842 -0.6851405 0.51444450
## adjPval contrast feature
## condition4.D6VTK4 0.8977611 condition4 D6VTK4
## condition4.O13297 0.9971488 condition4 O13297
## condition4.O13329 0.9971488 condition4 O13329
## condition4.O13539 0.9971488 condition4 O13539
## condition4.O13563 0.9971488 condition4 O13563
## condition4.O13585 0.9971488 condition4 O13585
Note, that missing values can occur because data modelling resulted in a fitError. We refer users to our section 2.9 of our msqrob2 book where we elaborate on how to deal with proteins that could not be fit.
The results also include an adjusted p-value for each protein to correct for multiple testing using the Benjamini-Hochberg False Discovery Rate (FDR) method, which represents an estimate of the minimum FDR at which the test result for protein is considered significant, i.e. the expected fraction of false positives that are reported in the shortest top list of the most significant proteins that include protein \(j\).
Note, that for this design we exceptionally also know the ground truth. So we will also add this information to the inference table:
inferences <- inferences |>
mutate(species = rep(rowData(qf[["proteins"]])$species, ncol(L)))
We will return tables of proteins for which the contrasts are significant at the 5% FDR level.
alpha <- 0.05 #1.
for (contrasti in colnames(L)) #2.
{
sigList <- inferences |>
filter(contrast == contrasti & adjPval < alpha) #3.
cat("**Contrast:**", contrasti, "= 0 (", nrow(sigList), " significant proteins)\n\n") #4.
print(kable(sigList) |>
kable_styling(full_width = FALSE) |>
scroll_box(height = "250px")
) #4.
cat("\n\n\n---\n\n") #4.
}
Contrast: condition4 = 0 ( 24 significant proteins)
| logFC | se | df | t | pval | adjPval | contrast | feature | species | |
|---|---|---|---|---|---|---|---|---|---|
| condition4.UPS|O76070ups|SYUG_HUMAN_UPS | 1.1688286 | 0.1067814 | 7.822585 | 10.945992 | 0.0000051 | 0.0017685 | condition4 | UPS|O76070ups|SYUG_HUMAN_UPS | ups |
| condition4.UPS|P00167ups|CYB5_HUMAN_UPS | 0.9550911 | 0.1600412 | 8.489309 | 5.967784 | 0.0002656 | 0.0343955 | condition4 | UPS|P00167ups|CYB5_HUMAN_UPS | ups |
| condition4.UPS|P00915ups|CAH1_HUMAN_UPS | 0.8576361 | 0.0824126 | 7.256450 | 10.406617 | 0.0000128 | 0.0031874 | condition4 | UPS|P00915ups|CAH1_HUMAN_UPS | ups |
| condition4.UPS|P00918ups|CAH2_HUMAN_UPS | 0.8983091 | 0.0817491 | 8.634229 | 10.988613 | 0.0000023 | 0.0011811 | condition4 | UPS|P00918ups|CAH2_HUMAN_UPS | ups |
| condition4.UPS|P01008ups|ANT3_HUMAN_UPS | 0.8983310 | 0.0714148 | 7.189268 | 12.579052 | 0.0000037 | 0.0016515 | condition4 | UPS|P01008ups|ANT3_HUMAN_UPS | ups |
| condition4.UPS|P01031ups|CO5_HUMAN_UPS | 0.9394264 | 0.1521149 | 8.256994 | 6.175770 | 0.0002340 | 0.0330532 | condition4 | UPS|P01031ups|CO5_HUMAN_UPS | ups |
| condition4.UPS|P02144ups|MYG_HUMAN_UPS | 0.9151465 | 0.0980487 | 8.073330 | 9.333587 | 0.0000133 | 0.0031874 | condition4 | UPS|P02144ups|MYG_HUMAN_UPS | ups |
| condition4.UPS|P02753ups|RETBP_HUMAN_UPS | 1.0894906 | 0.1129798 | 8.511909 | 9.643231 | 0.0000072 | 0.0021366 | condition4 | UPS|P02753ups|RETBP_HUMAN_UPS | ups |
| condition4.UPS|P02768ups|ALBU_HUMAN_UPS | 0.9545618 | 0.0707104 | 8.555250 | 13.499601 | 0.0000005 | 0.0004774 | condition4 | UPS|P02768ups|ALBU_HUMAN_UPS | ups |
| condition4.UPS|P04040ups|CATA_HUMAN_UPS | 0.8971884 | 0.0927431 | 7.670533 | 9.673908 | 0.0000145 | 0.0032173 | condition4 | UPS|P04040ups|CATA_HUMAN_UPS | ups |
| condition4.UPS|P06732ups|KCRM_HUMAN_UPS | 1.0574878 | 0.0724997 | 7.498220 | 14.586095 | 0.0000009 | 0.0005571 | condition4 | UPS|P06732ups|KCRM_HUMAN_UPS | ups |
| condition4.UPS|P08263ups|GSTA1_HUMAN_UPS | 0.8494670 | 0.1182945 | 7.660080 | 7.180952 | 0.0001167 | 0.0190923 | condition4 | UPS|P08263ups|GSTA1_HUMAN_UPS | ups |
| condition4.UPS|P12081ups|SYHC_HUMAN_UPS | 0.9807771 | 0.0899123 | 7.961163 | 10.908158 | 0.0000046 | 0.0017685 | condition4 | UPS|P12081ups|SYHC_HUMAN_UPS | ups |
| condition4.UPS|P15559ups|NQO1_HUMAN_UPS | 0.9585884 | 0.0991542 | 8.431726 | 9.667650 | 0.0000076 | 0.0021366 | condition4 | UPS|P15559ups|NQO1_HUMAN_UPS | ups |
| condition4.UPS|P41159ups|LEP_HUMAN_UPS | 1.0437776 | 0.1540155 | 8.219999 | 6.777094 | 0.0001244 | 0.0193306 | condition4 | UPS|P41159ups|LEP_HUMAN_UPS | ups |
| condition4.UPS|P55957ups|BID_HUMAN_UPS | 1.3946452 | 0.2030001 | 7.634229 | 6.870168 | 0.0001597 | 0.0236380 | condition4 | UPS|P55957ups|BID_HUMAN_UPS | ups |
| condition4.UPS|P62937ups|PPIA_HUMAN_UPS | 0.9706806 | 0.0723174 | 8.621128 | 13.422498 | 0.0000004 | 0.0004774 | condition4 | UPS|P62937ups|PPIA_HUMAN_UPS | ups |
| condition4.UPS|P62988ups|UBIQ_HUMAN_UPS | 0.8146918 | 0.0998367 | 8.520830 | 8.160242 | 0.0000262 | 0.0054243 | condition4 | UPS|P62988ups|UBIQ_HUMAN_UPS | ups |
| condition4.UPS|P63165ups|SUMO1_HUMAN_UPS | 1.0093601 | 0.0662283 | 8.266709 | 15.240620 | 0.0000002 | 0.0004774 | condition4 | UPS|P63165ups|SUMO1_HUMAN_UPS | ups |
| condition4.UPS|P63279ups|UBC9_HUMAN_UPS | 0.9538438 | 0.1230115 | 8.227903 | 7.754101 | 0.0000468 | 0.0085640 | condition4 | UPS|P63279ups|UBC9_HUMAN_UPS | ups |
| condition4.UPS|P68871ups|HBB_HUMAN_UPS | 0.8756257 | 0.1181265 | 8.546292 | 7.412611 | 0.0000533 | 0.0092058 | condition4 | UPS|P68871ups|HBB_HUMAN_UPS | ups |
| condition4.UPS|P69905ups|HBA_HUMAN_UPS | 0.9088946 | 0.1135603 | 8.618881 | 8.003633 | 0.0000284 | 0.0055175 | condition4 | UPS|P69905ups|HBA_HUMAN_UPS | ups |
| condition4.UPS|Q06830ups|PRDX1_HUMAN_UPS | 0.8637362 | 0.0698900 | 8.634229 | 12.358512 | 0.0000009 | 0.0005571 | condition4 | UPS|Q06830ups|PRDX1_HUMAN_UPS | ups |
| condition4.UPS|Q15843ups|NEDD8_HUMAN_UPS | 1.0796467 | 0.1718163 | 7.814438 | 6.283727 | 0.0002613 | 0.0343955 | condition4 | UPS|Q15843ups|NEDD8_HUMAN_UPS | ups |
Contrast: condition8 = 0 ( 30 significant proteins)
| logFC | se | df | t | pval | adjPval | contrast | feature | species | |
|---|---|---|---|---|---|---|---|---|---|
| condition8.P12687 | 0.8703198 | 0.1396314 | 8.634229 | 6.232982 | 0.0001820 | 0.0194927 | condition8 | P12687 | yeast |
| condition8.P40547 | 0.8061507 | 0.1171118 | 8.617958 | 6.883597 | 0.0000888 | 0.0102156 | condition8 | P40547 | yeast |
| condition8.Q99325 | -0.9917794 | 0.1605899 | 7.817782 | -6.175852 | 0.0002926 | 0.0302932 | condition8 | Q99325 | yeast |
| condition8.UPS|O76070ups|SYUG_HUMAN_UPS | 2.4486204 | 0.1018090 | 7.822585 | 24.051118 | 0.0000000 | 0.0000050 | condition8 | UPS|O76070ups|SYUG_HUMAN_UPS | ups |
| condition8.UPS|P00167ups|CYB5_HUMAN_UPS | 2.1401353 | 0.1591407 | 8.489309 | 13.448074 | 0.0000005 | 0.0000775 | condition8 | UPS|P00167ups|CYB5_HUMAN_UPS | ups |
| condition8.UPS|P00915ups|CAH1_HUMAN_UPS | 2.0504959 | 0.0910654 | 7.256450 | 22.516729 | 0.0000001 | 0.0000108 | condition8 | UPS|P00915ups|CAH1_HUMAN_UPS | ups |
| condition8.UPS|P00918ups|CAH2_HUMAN_UPS | 2.0513638 | 0.0817491 | 8.634229 | 25.093416 | 0.0000000 | 0.0000014 | condition8 | UPS|P00918ups|CAH2_HUMAN_UPS | ups |
| condition8.UPS|P01008ups|ANT3_HUMAN_UPS | 2.0560943 | 0.0788311 | 7.189268 | 26.082271 | 0.0000000 | 0.0000062 | condition8 | UPS|P01008ups|ANT3_HUMAN_UPS | ups |
| condition8.UPS|P01031ups|CO5_HUMAN_UPS | 1.8633810 | 0.1536206 | 8.256994 | 12.129759 | 0.0000015 | 0.0001949 | condition8 | UPS|P01031ups|CO5_HUMAN_UPS | ups |
| condition8.UPS|P02144ups|MYG_HUMAN_UPS | 2.0203635 | 0.1035323 | 8.073330 | 19.514337 | 0.0000000 | 0.0000092 | condition8 | UPS|P02144ups|MYG_HUMAN_UPS | ups |
| condition8.UPS|P02753ups|RETBP_HUMAN_UPS | 2.0920932 | 0.1117980 | 8.511909 | 18.713148 | 0.0000000 | 0.0000078 | condition8 | UPS|P02753ups|RETBP_HUMAN_UPS | ups |
| condition8.UPS|P02768ups|ALBU_HUMAN_UPS | 2.0923907 | 0.0711867 | 8.555250 | 29.392983 | 0.0000000 | 0.0000006 | condition8 | UPS|P02768ups|ALBU_HUMAN_UPS | ups |
| condition8.UPS|P04040ups|CATA_HUMAN_UPS | 2.0944595 | 0.0884488 | 7.670533 | 23.679911 | 0.0000000 | 0.0000062 | condition8 | UPS|P04040ups|CATA_HUMAN_UPS | ups |
| condition8.UPS|P06732ups|KCRM_HUMAN_UPS | 2.1478951 | 0.0736127 | 7.498220 | 29.178307 | 0.0000000 | 0.0000028 | condition8 | UPS|P06732ups|KCRM_HUMAN_UPS | ups |
| condition8.UPS|P08263ups|GSTA1_HUMAN_UPS | 1.9401553 | 0.1173190 | 7.660080 | 16.537440 | 0.0000003 | 0.0000469 | condition8 | UPS|P08263ups|GSTA1_HUMAN_UPS | ups |
| condition8.UPS|P10599ups|THIO_HUMAN_UPS | 2.0933915 | 0.2825016 | 7.631264 | 7.410193 | 0.0000961 | 0.0106607 | condition8 | UPS|P10599ups|THIO_HUMAN_UPS | ups |
| condition8.UPS|P12081ups|SYHC_HUMAN_UPS | 2.0926925 | 0.0890061 | 7.961163 | 23.511789 | 0.0000000 | 0.0000050 | condition8 | UPS|P12081ups|SYHC_HUMAN_UPS | ups |
| condition8.UPS|P15559ups|NQO1_HUMAN_UPS | 1.9550501 | 0.0974071 | 8.431726 | 20.070911 | 0.0000000 | 0.0000062 | condition8 | UPS|P15559ups|NQO1_HUMAN_UPS | ups |
| condition8.UPS|P16083ups|NQO2_HUMAN_UPS | 2.1475284 | 0.2061866 | 8.361276 | 10.415459 | 0.0000045 | 0.0005583 | condition8 | UPS|P16083ups|NQO2_HUMAN_UPS | ups |
| condition8.UPS|P41159ups|LEP_HUMAN_UPS | 1.8613224 | 0.1481948 | 8.219999 | 12.559974 | 0.0000012 | 0.0001617 | condition8 | UPS|P41159ups|LEP_HUMAN_UPS | ups |
| condition8.UPS|P55957ups|BID_HUMAN_UPS | 2.7816594 | 0.2030001 | 7.634229 | 13.702746 | 0.0000012 | 0.0001617 | condition8 | UPS|P55957ups|BID_HUMAN_UPS | ups |
| condition8.UPS|P61626ups|LYSC_HUMAN_UPS | 2.0245577 | 0.1868859 | 7.270806 | 10.833124 | 0.0000096 | 0.0011415 | condition8 | UPS|P61626ups|LYSC_HUMAN_UPS | ups |
| condition8.UPS|P62937ups|PPIA_HUMAN_UPS | 2.1495667 | 0.0722383 | 8.621128 | 29.756622 | 0.0000000 | 0.0000006 | condition8 | UPS|P62937ups|PPIA_HUMAN_UPS | ups |
| condition8.UPS|P62988ups|UBIQ_HUMAN_UPS | 1.8965209 | 0.0998367 | 8.520830 | 18.996225 | 0.0000000 | 0.0000074 | condition8 | UPS|P62988ups|UBIQ_HUMAN_UPS | ups |
| condition8.UPS|P63165ups|SUMO1_HUMAN_UPS | 2.0689042 | 0.0662283 | 8.266709 | 31.238982 | 0.0000000 | 0.0000006 | condition8 | UPS|P63165ups|SUMO1_HUMAN_UPS | ups |
| condition8.UPS|P63279ups|UBC9_HUMAN_UPS | 2.2002153 | 0.1203024 | 8.227903 | 18.289033 | 0.0000001 | 0.0000108 | condition8 | UPS|P63279ups|UBC9_HUMAN_UPS | ups |
| condition8.UPS|P68871ups|HBB_HUMAN_UPS | 2.0471710 | 0.1190149 | 8.546292 | 17.200961 | 0.0000001 | 0.0000108 | condition8 | UPS|P68871ups|HBB_HUMAN_UPS | ups |
| condition8.UPS|P69905ups|HBA_HUMAN_UPS | 2.0617557 | 0.1135603 | 8.618881 | 18.155609 | 0.0000000 | 0.0000080 | condition8 | UPS|P69905ups|HBA_HUMAN_UPS | ups |
| condition8.UPS|Q06830ups|PRDX1_HUMAN_UPS | 2.0010121 | 0.0698900 | 8.634229 | 28.630886 | 0.0000000 | 0.0000006 | condition8 | UPS|Q06830ups|PRDX1_HUMAN_UPS | ups |
| condition8.UPS|Q15843ups|NEDD8_HUMAN_UPS | 2.3419997 | 0.1576358 | 7.814438 | 14.857033 | 0.0000005 | 0.0000775 | condition8 | UPS|Q15843ups|NEDD8_HUMAN_UPS | ups |
Contrast: condition8 - condition4 = 0 ( 27 significant proteins)
| logFC | se | df | t | pval | adjPval | contrast | feature | species | |
|---|---|---|---|---|---|---|---|---|---|
| condition8 - condition4.P47150 | 0.7047037 | 0.0922144 | 5.634229 | 7.642010 | 0.0003534 | 0.0406570 | condition8 - condition4 | P47150 | yeast |
| condition8 - condition4.Q05521 | -0.7327455 | 0.1026483 | 8.181034 | -7.138406 | 0.0000879 | 0.0121999 | condition8 - condition4 | Q05521 | yeast |
| condition8 - condition4.Q99325 | -0.9935997 | 0.1632031 | 7.817782 | -6.088118 | 0.0003216 | 0.0395417 | condition8 - condition4 | Q99325 | yeast |
| condition8 - condition4.UPS|O76070ups|SYUG_HUMAN_UPS | 1.2797917 | 0.1042936 | 7.822585 | 12.271046 | 0.0000022 | 0.0006183 | condition8 - condition4 | UPS|O76070ups|SYUG_HUMAN_UPS | ups |
| condition8 - condition4.UPS|P00167ups|CYB5_HUMAN_UPS | 1.1850442 | 0.1590028 | 8.489309 | 7.452979 | 0.0000530 | 0.0082364 | condition8 - condition4 | UPS|P00167ups|CYB5_HUMAN_UPS | ups |
| condition8 - condition4.UPS|P00915ups|CAH1_HUMAN_UPS | 1.1928598 | 0.0882951 | 7.256450 | 13.509923 | 0.0000021 | 0.0006183 | condition8 - condition4 | UPS|P00915ups|CAH1_HUMAN_UPS | ups |
| condition8 - condition4.UPS|P00918ups|CAH2_HUMAN_UPS | 1.1530547 | 0.0817491 | 8.634229 | 14.104803 | 0.0000003 | 0.0001821 | condition8 - condition4 | UPS|P00918ups|CAH2_HUMAN_UPS | ups |
| condition8 - condition4.UPS|P01008ups|ANT3_HUMAN_UPS | 1.1577634 | 0.0760012 | 7.189268 | 15.233486 | 0.0000010 | 0.0004357 | condition8 - condition4 | UPS|P01008ups|ANT3_HUMAN_UPS | ups |
| condition8 - condition4.UPS|P01031ups|CO5_HUMAN_UPS | 0.9239546 | 0.1553443 | 8.256994 | 5.947787 | 0.0003031 | 0.0392293 | condition8 - condition4 | UPS|P01031ups|CO5_HUMAN_UPS | ups |
| condition8 - condition4.UPS|P02144ups|MYG_HUMAN_UPS | 1.1052170 | 0.1035323 | 8.073330 | 10.675097 | 0.0000049 | 0.0010765 | condition8 - condition4 | UPS|P02144ups|MYG_HUMAN_UPS | ups |
| condition8 - condition4.UPS|P02753ups|RETBP_HUMAN_UPS | 1.0026026 | 0.1129798 | 8.511909 | 8.874173 | 0.0000138 | 0.0022575 | condition8 - condition4 | UPS|P02753ups|RETBP_HUMAN_UPS | ups |
| condition8 - condition4.UPS|P02768ups|ALBU_HUMAN_UPS | 1.1378289 | 0.0711867 | 8.555250 | 15.983719 | 0.0000001 | 0.0000975 | condition8 - condition4 | UPS|P02768ups|ALBU_HUMAN_UPS | ups |
| condition8 - condition4.UPS|P04040ups|CATA_HUMAN_UPS | 1.1972711 | 0.0894148 | 7.670533 | 13.390082 | 0.0000014 | 0.0004691 | condition8 - condition4 | UPS|P04040ups|CATA_HUMAN_UPS | ups |
| condition8 - condition4.UPS|P06732ups|KCRM_HUMAN_UPS | 1.0904074 | 0.0667896 | 7.498220 | 16.326006 | 0.0000004 | 0.0002039 | condition8 - condition4 | UPS|P06732ups|KCRM_HUMAN_UPS | ups |
| condition8 - condition4.UPS|P08263ups|GSTA1_HUMAN_UPS | 1.0906883 | 0.1081452 | 7.660080 | 10.085411 | 0.0000109 | 0.0018731 | condition8 - condition4 | UPS|P08263ups|GSTA1_HUMAN_UPS | ups |
| condition8 - condition4.UPS|P12081ups|SYHC_HUMAN_UPS | 1.1119153 | 0.0854195 | 7.961163 | 13.017109 | 0.0000012 | 0.0004668 | condition8 - condition4 | UPS|P12081ups|SYHC_HUMAN_UPS | ups |
| condition8 - condition4.UPS|P15559ups|NQO1_HUMAN_UPS | 0.9964617 | 0.0991542 | 8.431726 | 10.049614 | 0.0000056 | 0.0010886 | condition8 - condition4 | UPS|P15559ups|NQO1_HUMAN_UPS | ups |
| condition8 - condition4.UPS|P16083ups|NQO2_HUMAN_UPS | 1.1912520 | 0.2043093 | 8.361276 | 5.830629 | 0.0003310 | 0.0395417 | condition8 - condition4 | UPS|P16083ups|NQO2_HUMAN_UPS | ups |
| condition8 - condition4.UPS|P55957ups|BID_HUMAN_UPS | 1.3870142 | 0.1815689 | 7.634229 | 7.639053 | 0.0000779 | 0.0115288 | condition8 - condition4 | UPS|P55957ups|BID_HUMAN_UPS | ups |
| condition8 - condition4.UPS|P62937ups|PPIA_HUMAN_UPS | 1.1788861 | 0.0723174 | 8.621128 | 16.301549 | 0.0000001 | 0.0000975 | condition8 - condition4 | UPS|P62937ups|PPIA_HUMAN_UPS | ups |
| condition8 - condition4.UPS|P62988ups|UBIQ_HUMAN_UPS | 1.0818290 | 0.0988704 | 8.520830 | 10.941887 | 0.0000026 | 0.0006794 | condition8 - condition4 | UPS|P62988ups|UBIQ_HUMAN_UPS | ups |
| condition8 - condition4.UPS|P63165ups|SUMO1_HUMAN_UPS | 1.0595440 | 0.0640311 | 8.266709 | 16.547325 | 0.0000001 | 0.0000975 | condition8 - condition4 | UPS|P63165ups|SUMO1_HUMAN_UPS | ups |
| condition8 - condition4.UPS|P63279ups|UBC9_HUMAN_UPS | 1.2463715 | 0.1214954 | 8.227903 | 10.258593 | 0.0000057 | 0.0010886 | condition8 - condition4 | UPS|P63279ups|UBC9_HUMAN_UPS | ups |
| condition8 - condition4.UPS|P68871ups|HBB_HUMAN_UPS | 1.1715453 | 0.1190149 | 8.546292 | 9.843685 | 0.0000060 | 0.0010886 | condition8 - condition4 | UPS|P68871ups|HBB_HUMAN_UPS | ups |
| condition8 - condition4.UPS|P69905ups|HBA_HUMAN_UPS | 1.1528610 | 0.1134146 | 8.618881 | 10.165019 | 0.0000043 | 0.0010354 | condition8 - condition4 | UPS|P69905ups|HBA_HUMAN_UPS | ups |
| condition8 - condition4.UPS|Q06830ups|PRDX1_HUMAN_UPS | 1.1372759 | 0.0698900 | 8.634229 | 16.272374 | 0.0000001 | 0.0000975 | condition8 - condition4 | UPS|Q06830ups|PRDX1_HUMAN_UPS | ups |
| condition8 - condition4.UPS|Q15843ups|NEDD8_HUMAN_UPS | 1.2623530 | 0.1718163 | 7.814438 | 7.347109 | 0.0000903 | 0.0121999 | condition8 - condition4 | UPS|Q15843ups|NEDD8_HUMAN_UPS | ups |
A volcano plot is a common visualisation that provides an overview of
the hypothesis testing results, plotting the \(-\log_{10}\) p-value1 Note,
that upon this transformation a value of 1 represents a p-value of 0.1, 2
a p-value of 0.01, 3 a p-value of 0.001, etc. as a function of
the estimated log fold change. Volcano plots can be used to highlight
the most interesting proteins that have large fold changes and/or are
highly significant. We can use the table above directly to build a
volcano plot using ggplot2 functionality. We also highlight which
proteins are UPS standards, known to be differentially abundant by
experimental design.
Experienced users can make the plot themselves. However, in msqrob2 we also
provide the plotVolcano function that generates volcanoplots based on msqrob2
inference tables generated with the hypothesisTest function.
Since the inference table contains multiple contrast we use the facet_wrap
function to make a separate volcano plot for each contrast.
volcanoplots <- inferences |>
plotVolcano() +
facet_wrap(~contrast)
volcanoplots
We construct heatmaps for the significant proteins for each contrast.
L with a
user-defined function that extractsproteins along with its colData andalphaheatmaps <- lapply(
colnames(L), #1.
function(contrast, se, alpha)
{
sig <- rowData(se)[[contrast]] |> #2.
filter(adjPval < alpha) |>
rownames()
quants <- t(scale(t(assay(se[sig,])))) #3
colnames(quants) <- paste0("con", se$condition,"_rep",se$rep) #4.
annotations <- columnAnnotation(condition = se$condition) #4.
set.seed(1234) ## annotation colours are randomly generated by default
return( #.5
Heatmap(
quants,
name = "log2 intensity",
top_annotation = annotations,
column_title = paste0(contrast, " = 0")
)
)
},
se = getWithColData(qf, "proteins"), #6.
alpha = alpha) #7.
## Warning: 'experiments' dropped; see 'drops()'
heatmaps
## [[1]]
##
## [[2]]
##
## [[3]]
A typical workflow for the analysis of a proteomics experiment will end.
Note, that the evaluations in this section are typically not possible on real experimental data. Indeed, we are using a spike-in dataset so we know the ground truth: all human UPS proteins are differentially abundant (DA) between the conditions and the yeast proteins are non-DA.
We use a nominal FDR level of 0.05
alpha <- 0.05
As this is a spike-in study with known ground truth, we can also plot the log2 fold change distributions against the expected values, in this case 0 for the yeast proteins and the difference of the log concentration for the spiked-in UPS standards. We first create a small table with the real values.
We use a similar operation to construct the name of the corresponding contrast.
(realLogFC <- data.frame(
logFC = colData(qf)$condition |>
levels() |> # 1.
as.double() |> # 2.
log2() |> # 3.
combn(m=2, FUN = diff), #4.
contrast = colData(qf)$condition |>
levels() |>
combn(m=2, FUN= function(x) paste0(x[2]," - ",x[1])) |>
recode("4 - 2" = "condition4",
"8 - 2" = "condition8",
"8 - 4" = "condition8 - condition4")
)
)
## logFC contrast
## 1 1 condition4
## 2 2 condition8
## 3 1 condition8 - condition4
We now evaluate if the estimated fold changes correspond to the real fold changes.
logFC <- inferences |>
filter(!is.na(logFC)) |>
ggplot(aes(x = species, y = logFC, color= species)) + #1.
geom_boxplot() + #2.
theme_bw() +
scale_color_manual(
values = c("grey20", "firebrick"), #3.
name = "",
labels = c("yeast", "ups")
) +
geom_hline(yintercept = 0, color="grey20") + # 4.
facet_wrap(~contrast) +
geom_hline(aes(yintercept = logFC), color="firebrick", data=realLogFC) #5.
logFC
All human UPS proteins are differentially abundant (DA) between the conditions and the yeast proteins are non-DA.
tpFpTable <- group_by(inferences, contrast) |>
filter(adjPval < alpha) |>
summarise("TP" = sum(species == "ups"),
"FP" = sum(species != "ups"),
"FDP" = mean(species != "ups"))
tpFpTable
## # A tibble: 3 × 4
## contrast TP FP FDP
## <chr> <int> <int> <dbl>
## 1 condition4 24 0 0
## 2 condition8 27 3 0.1
## 3 condition8 - condition4 24 3 0.111
We generate the TPR-FDP curves to assess the performance to prioritise differentially abundant proteins. Again, these curves are built using the ground truth information about which proteins are differentially abundant (spiked in) and which proteins are constant across samples. We create two functions to compute the TPR and the FDP.
computeFDP <- function(pval, tp) {
ord <- order(pval)
fdp <- cumsum(!tp[ord]) / 1:length(tp)
fdp[order(ord)]
}
computeTPR <- function(pval, tp, nTP = NULL) {
if (is.null(nTP)) nTP <- sum(tp)
ord <- order(pval)
tpr <- cumsum(tp[ord]) / nTP
tpr[order(ord)]
}
We apply these functions and compute the corresponding metric using the statistical inference results and the ground truth information.
performance <- inferences |> group_by(contrast) |>
na.exclude() |>
mutate(tpr = computeTPR(pval, species == "ups"),
fdp = computeFDP(pval, species == "ups")) |>
arrange(fdp)
We also highlight the observed FDP at a \(alpha = 5\%\) FDR threshold.
workPoints <- performance |>
group_by(contrast) |>
filter(adjPval < 0.05) |>
slice_max(pval)
ggplot(performance) +
aes(
y = fdp,
x = tpr,
) +
geom_line() +
geom_point(data = workPoints, size = 3) +
geom_hline(yintercept = 0.05, linetype = 2) +
facet_wrap( ~ contrast) +
coord_flip(ylim = c(0, 0.2)) +
theme_minimal()
We make a separate plot per contrast and compare the volcano plots for all normalisation x summarisation combinations.
inferences |>
plotVolcano() +
facet_wrap(~contrast) +
aes(color = species, shape = adjPval < alpha) +
scale_color_manual(
values = c("grey20", "firebrick"),
name = "species",
labels = c("yeast", "ups")
) +
labs(shape="FDR < 0.05") +
geom_vline(aes(xintercept = logFC), data = realLogFC , col ="firebrick") +
geom_hline(aes(yintercept = -log10(adjAlpha)), data = inferences |>
group_by(contrast) |>
summarize(adjAlpha = alpha *mean(adjPval < alpha, na.rm=TRUE), .groups="drop"))
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
Staes, An, Teresa Mendes Maia, Sara Dufour, Robbin Bouwmeester, Ralf Gabriels, Lennart Martens, Kris Gevaert, Francis Impens, and Simon Devos. 2024. “Benefit of in Silico Predicted Spectral Libraries in Data‑Independent Acquisition Data Analysis Workflows.” Journal of Proteome Research 23 (6): 2078–89. https://doi.org/10.1021/acs.jproteome.4c00048.
Note, that more examples can be found in our msqrob2 book.
sessionInfo()
## R version 4.6.0 alpha (2026-04-05 r89794)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] grid stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] scater_1.39.4 scuttle_1.21.6
## [3] SingleCellExperiment_1.33.2 ComplexHeatmap_2.27.1
## [5] kableExtra_1.4.0 ExploreModelMatrix_1.23.1
## [7] stringr_1.6.0 msqrob2_1.19.1
## [9] ggplot2_4.0.2 tidyr_1.3.2
## [11] dplyr_1.2.1 QFeatures_1.21.2
## [13] MultiAssayExperiment_1.37.4 SummarizedExperiment_1.41.1
## [15] Biobase_2.71.0 GenomicRanges_1.63.2
## [17] Seqinfo_1.1.0 IRanges_2.45.0
## [19] S4Vectors_0.49.1 BiocGenerics_0.57.0
## [21] generics_0.1.4 MatrixGenerics_1.23.0
## [23] matrixStats_1.5.0 BiocStyle_2.39.0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 rstudioapi_0.18.0 jsonlite_2.0.0
## [4] shape_1.4.6.1 magrittr_2.0.5 magick_2.9.1
## [7] ggbeeswarm_0.7.3 farver_2.1.2 nloptr_2.2.1
## [10] rmarkdown_2.31 GlobalOptions_0.1.4 vctrs_0.7.3
## [13] Cairo_1.7-0 minqa_1.2.8 tinytex_0.59
## [16] BiocBaseUtils_1.13.0 htmltools_0.5.9 S4Arrays_1.11.1
## [19] BiocNeighbors_2.5.4 SparseArray_1.11.13 sass_0.4.10
## [22] bslib_0.10.0 htmlwidgets_1.6.4 plyr_1.8.9
## [25] cachem_1.1.0 igraph_2.2.3 mime_0.13
## [28] lifecycle_1.0.5 iterators_1.0.14 pkgconfig_2.0.3
## [31] rsvd_1.0.5 Matrix_1.7-5 R6_2.6.1
## [34] fastmap_1.2.0 rbibutils_2.4.1 shiny_1.13.0
## [37] clue_0.3-68 digest_0.6.39 colorspace_2.1-2
## [40] irlba_2.3.7 textshaping_1.0.5 beachmat_2.27.5
## [43] labeling_0.4.3 abind_1.4-8 compiler_4.6.0
## [46] bit64_4.6.0-1 withr_3.0.2 doParallel_1.0.17
## [49] S7_0.2.1 ggcorrplot_0.1.4.1 BiocParallel_1.45.0
## [52] viridis_0.6.5 iq_2.0.1 MASS_7.3-65
## [55] DelayedArray_0.37.1 rjson_0.2.23 tools_4.6.0
## [58] vipor_0.4.7 otel_0.2.0 beeswarm_0.4.0
## [61] httpuv_1.6.17 glue_1.8.0 nlme_3.1-169
## [64] promises_1.5.0 cluster_2.1.8.2 reshape2_1.4.5
## [67] gtable_0.3.6 data.table_1.18.2.1 utf8_1.2.6
## [70] BiocSingular_1.27.1 ScaledMatrix_1.19.0 xml2_1.5.2
## [73] XVector_0.51.0 ggrepel_0.9.8 foreach_1.5.2
## [76] pillar_1.11.1 limma_3.67.1 later_1.4.8
## [79] rintrojs_0.3.4 circlize_0.4.18 splines_4.6.0
## [82] lattice_0.22-9 bit_4.6.0 tidyselect_1.2.1
## [85] knitr_1.51 gridExtra_2.3 reformulas_0.4.4
## [88] bookdown_0.46 ProtGenerics_1.43.0 svglite_2.2.2
## [91] xfun_0.57 shinydashboard_0.7.3 statmod_1.5.1
## [94] DT_0.34.0 stringi_1.8.7 lazyeval_0.2.3
## [97] yaml_2.3.12 boot_1.3-32 evaluate_1.0.5
## [100] codetools_0.2-20 MsCoreUtils_1.23.9 tibble_3.3.1
## [103] BiocManager_1.30.27 cli_3.6.6 arrow_23.0.1.2
## [106] xtable_1.8-8 systemfonts_1.3.2 Rdpack_2.6.6
## [109] jquerylib_0.1.4 dichromat_2.0-0.1 Rcpp_1.1.1
## [112] png_0.1-9 parallel_4.6.0 assertthat_0.2.1
## [115] AnnotationFilter_1.35.0 lme4_2.0-1 viridisLite_0.4.3
## [118] scales_1.4.0 purrr_1.2.2 crayon_1.5.3
## [121] GetoptLong_1.1.1 rlang_1.2.0 cowplot_1.2.0
## [124] shinyjs_2.1.1