QFeatures 1.0.0
The QFeatures
package provides infrastructure (that is classes and
the methods to process and manipulate them) to manage and analyse
quantitative features from mass spectrometry experiments. It is based
on the MultiAssayExperiment
class from the MultiAssayExperiment
(Ramos et al. 2017). that stores a set of assays. Assays in a QFeatures
object have a specific relation, that is depicted in figure
1: assays in a QFeatures
object are the result
of the aggregation of quantitative features of other assays. In the
case of a quantitative proteomics experiment, these different assays
would be PSMs, that are aggregated into peptides, that are themselves
aggregated into proteins.
Figure 1: Conceptual representation of a QFeatures
object and the aggregative relation between different assays
In the following sections, we are going to demonstrate how to create a
single-assay QFeatures
objects starting from a spreadsheet, how to
compute the next assays (peptides and proteins), and how these can be
manipulated and explored.
library("QFeatures")
QFeatures
objectWhile QFeatures
objects can be created manually (see ?QFeatures
for
details), most users will probably possess quantitative data in a
spreadsheet or a dataframe. In such cases, the easiest is to use the
readQFeatures
function to extract the quantitative data and metadata
columns. Below, we load the hlpsms
dataframe that contains data for
28 PSMs from the TMT-10plex hyperLOPIT spatial proteomics experiment
from (Christoforou et al. 2016). The ecol
argument specifies that columns
1 to 10 contain quantitation data, and that the assay should be named
psms
in the returned QFeatures
object, to reflect the nature of the
data.
data(hlpsms)
hl <- readQFeatures(hlpsms, ecol = 1:10, name = "psms")
hl
## An instance of class QFeatures containing 1 assays:
## [1] psms: SummarizedExperiment with 3010 rows and 10 columns
Below, we see that we can extract an assay using its index or its
name. The individual assays are stored as SummerizedExperiment
object and further access its quantitative data and metadata using
the assay
and rowData
functions
hl[[1]]
## class: SummarizedExperiment
## dim: 3010 10
## metadata(0):
## assays(1): ''
## rownames(3010): 1 2 ... 3009 3010
## rowData names(18): Sequence ProteinDescriptions ... RTmin markers
## colnames(10): X126 X127C ... X130N X131
## colData names(0):
hl[["psms"]]
## class: SummarizedExperiment
## dim: 3010 10
## metadata(0):
## assays(1): ''
## rownames(3010): 1 2 ... 3009 3010
## rowData names(18): Sequence ProteinDescriptions ... RTmin markers
## colnames(10): X126 X127C ... X130N X131
## colData names(0):
head(assay(hl[["psms"]]))
## X126 X127C X127N X128C X128N X129C
## 1 0.12283431 0.08045915 0.070804055 0.09386901 0.051815695 0.13034383
## 2 0.35268185 0.14162381 0.167523880 0.07843497 0.071087436 0.03214548
## 3 0.01546089 0.16142297 0.086938133 0.23120844 0.114664348 0.09610188
## 4 0.04702854 0.09288723 0.102012167 0.11125409 0.067969116 0.14155358
## 5 0.01044693 0.15866147 0.167315736 0.21017494 0.147946673 0.07088253
## 6 0.04955362 0.01215244 0.002477681 0.01297833 0.002988949 0.06253195
## X129N X130C X130N X131
## 1 0.17540095 0.040068658 0.11478839 0.11961594
## 2 0.06686260 0.031961793 0.02810434 0.02957384
## 3 0.15977819 0.010127118 0.08059400 0.04370403
## 4 0.18015910 0.035329902 0.12166589 0.10014038
## 5 0.17555789 0.007088253 0.02884754 0.02307803
## 6 0.01726511 0.172651119 0.37007905 0.29732174
head(rowData(hl[["psms"]]))
## DataFrame with 6 rows and 18 columns
## Sequence ProteinDescriptions NbProteins ProteinGroupAccessions
## <character> <character> <integer> <character>
## 1 SQGEIDk Tetratrico... 1 Q8BYY4
## 2 YEAQGDk Vacuolar p... 1 P46467
## 3 TTScDTk C-type man... 1 Q64449
## 4 aEELESR Liprin-alp... 1 P60469
## 5 aQEEAIk Isoform 2 ... 2 P13597-2
## 6 dGAVDGcR Structural... 1 Q6P5D8
## Modifications qValue PEP IonScore NbMissedCleavages
## <character> <numeric> <numeric> <integer> <integer>
## 1 K7(TMT6ple... 0.008 0.11800 27 0
## 2 K7(TMT6ple... 0.001 0.01070 27 0
## 3 C4(Carbami... 0.008 0.11800 11 0
## 4 N-Term(TMT... 0.002 0.04450 24 0
## 5 N-Term(Car... 0.001 0.00850 36 0
## 6 N-Term(TMT... 0.000 0.00322 26 0
## IsolationInterference IonInjectTimems Intensity Charge mzDa MHDa
## <integer> <integer> <numeric> <integer> <numeric> <numeric>
## 1 0 70 335000 2 503.274 1005.54
## 2 0 70 926000 2 520.267 1039.53
## 3 0 70 159000 2 521.258 1041.51
## 4 0 70 232000 2 531.785 1062.56
## 5 0 70 212000 2 537.804 1074.60
## 6 0 70 865000 2 539.761 1078.51
## DeltaMassPPM RTmin markers
## <numeric> <numeric> <character>
## 1 -0.38 24.02 unknown
## 2 0.61 18.85 unknown
## 3 1.11 10.17 unknown
## 4 0.35 29.18 unknown
## 5 1.70 25.56 Plasma mem...
## 6 -0.67 21.27 Nucleus - ...
For further details on how to manipulate such objects, refer to the MultiAssayExperiment (Ramos et al. 2017) and SummerizedExperiment (Morgan et al. 2019) packages.
As illustrated in figure 1, an central
characteristic of QFeatures
objects is the aggregative relation
between their assays. This can be obtained with the
aggregateFeatures
function that will aggregate quantitative features
from one assay into a new one. In the next code chunk, we aggregate
PSM-level data into peptide by grouping all PSMs that were matched the
same peptide sequence. Below, the aggregation function is set, as an
example, to the mean. The new assay is named peptides.
hl <- aggregateFeatures(hl, "psms", "Sequence", name = "peptides", fun = colMeans)
## Your row data contain missing values. Please read the relevant
## section(s) in the aggregateFeatures manual page regarding the effects
## of missing values on data aggregation.
hl
## An instance of class QFeatures containing 2 assays:
## [1] psms: SummarizedExperiment with 3010 rows and 10 columns
## [2] peptides: SummarizedExperiment with 2923 rows and 10 columns
hl[["peptides"]]
## class: SummarizedExperiment
## dim: 2923 10
## metadata(0):
## assays(2): assay aggcounts
## rownames(2923): AAAVSTEGk AAIDYQk ... ykVEEASDLSISk ykVPQTEEPTAk
## rowData names(7): Sequence ProteinDescriptions ... markers .n
## colnames(10): X126 X127C ... X130N X131
## colData names(0):
Below, we repeat the aggregation operation by grouping peptides into proteins as defined by the ProteinGroupAccessions variable.
hl <- aggregateFeatures(hl, "peptides", "ProteinGroupAccessions", name = "proteins", fun = colMeans)
hl
## An instance of class QFeatures containing 3 assays:
## [1] psms: SummarizedExperiment with 3010 rows and 10 columns
## [2] peptides: SummarizedExperiment with 2923 rows and 10 columns
## [3] proteins: SummarizedExperiment with 1596 rows and 10 columns
hl[["proteins"]]
## class: SummarizedExperiment
## dim: 1596 10
## metadata(0):
## assays(2): assay aggcounts
## rownames(1596): A2A432 A2A6Q5-3 ... Q9Z2Z9 Q9Z315
## rowData names(3): ProteinGroupAccessions markers .n
## colnames(10): X126 X127C ... X130N X131
## colData names(0):
The sample assayed in a QFeatures
object can be documented in the
colData
slot. The hl
data doens’t currently possess any sample
metadata. These can be addedd as a new DataFrame
with matching names
(i.e. the DataFrame
rownames must be identical assay’s colnames) or
can be added one variable at at time, as shown below.
colData(hl)
## DataFrame with 10 rows and 0 columns
hl$tag <- c("126", "127N", "127C", "128N", "128C", "129N", "129C",
"130N", "130C", "131")
colData(hl)
## DataFrame with 10 rows and 1 column
## tag
## <character>
## X126 126
## X127C 127N
## X127N 127C
## X128C 128N
## X128N 128C
## X129C 129N
## X129N 129C
## X130C 130N
## X130N 130C
## X131 131
One particularity of the QFeatures
infrastructure is that the
features of the constitutive assays are linked through an aggregative
relation. This relation is recorded when creating new assays with
aggregateFeatures
and is exploited when subsetting QFeature
by their
feature names.
In the example below, we are interested in the Stat3B isoform of the Signal transducer and activator of transcription 3 (STAT3) with accession number P42227-2. This accession number corresponds to a feature name in the proteins assay. But this protein row was computed from 8 peptide rows in the peptides assay, themselves resulting from the aggregation of 8 rows in the psms assay.
stat3 <- hl["P42227-2", , ]
stat3
## An instance of class QFeatures containing 3 assays:
## [1] psms: SummarizedExperiment with 9 rows and 10 columns
## [2] peptides: SummarizedExperiment with 8 rows and 10 columns
## [3] proteins: SummarizedExperiment with 1 rows and 10 columns
We can easily visualise this new QFeatures object using ggplot2
once converted into a data.frame
.
stat3_df <- data.frame(longFormat(stat3))
stat3_df$assay <- factor(stat3_df$assay,
levels = c("psms", "peptides", "proteins"))
library("ggplot2")
ggplot(data = stat3_df,
aes(x = colname,
y = value,
group = rowname)) +
geom_line() + geom_point() +
facet_grid(~ assay)