MSstatsTMTPTM Example: An example workflow and analysis of the MSstatsTMTPTM package

This Vignette provides an example workflow for how to use the package MSstatsTMTPTM. It also provides examples and an analysis of how adjusting for global protein levels allows for better interpretations of PTM modeling results.

1. Workflow

1.1 Raw Data Format

The first step is to load in the raw dataset for both the PTM and Protein datasets. This can be the output of the MSstatsTMT converter functions: PDtoMSstatsTMTFormat, SpectroMinetoMSstatsTMTFormat, and OpenMStoMSstatsTMTFormat. Both the PTM and protein datasets must include the following columns: ProteinName, PeptideSequence, Charge, PSM, Mixture, TechRepMixture, Run, Channel, Condition, BioReplicate, and Intensity.

1.1.1 Raw PTM Data

# read in raw data files
# raw.ptm <- read.csv(file="raw.ptm.csv", header=TRUE)
head(raw.ptm)
#>       ProteinName PeptideSequence Charge           PSM Mixture TechRepMixture
#> 1 Protein_12_S703     Peptide_491      3 Peptide_491_3       1              1
#> 2 Protein_12_S703     Peptide_491      3 Peptide_491_3       1              1
#> 3 Protein_12_S703     Peptide_491      3 Peptide_491_3       1              1
#> 4 Protein_12_S703     Peptide_491      3 Peptide_491_3       1              1
#> 5 Protein_12_S703     Peptide_491      3 Peptide_491_3       1              1
#> 6 Protein_12_S703     Peptide_491      3 Peptide_491_3       1              1
#>   Run Channel   Condition  BioReplicate Intensity
#> 1 1_1    128N Condition_2 Condition_2_1   48030.0
#> 2 1_1    129C Condition_4 Condition_4_2  100224.4
#> 3 1_1    131C Condition_3 Condition_3_2   66804.6
#> 4 1_1    130N Condition_1 Condition_1_2   46779.8
#> 5 1_1    128C Condition_6 Condition_6_1   77497.9
#> 6 1_1    126C Condition_4 Condition_4_1   81559.7

It is important to note the ProteinName column in the PTM dataset represents the modification sites. The location of the modification must be added into the ProteinName. For example the first row shows Protein_12_S703 for ProteinName, with Y474 being the modificaiton site. This can be done as shown above, or by adding the PeptideSequence into the ProteinName, ex. Protein_12_Peptide_491 for the first row.

1.1.1 Raw Protein Data

head(raw.protein)
#>   ProteinName PeptideSequence Charge             PSM Mixture TechRepMixture Run
#> 1  Protein_12    Peptide_9121      3  Peptide_9121_3       1              1 1_1
#> 2  Protein_12   Peptide_27963      5 Peptide_27963_5       1              1 1_1
#> 3  Protein_12   Peptide_28482      4 Peptide_28482_4       1              1 1_1
#> 4  Protein_12   Peptide_10940      2 Peptide_10940_2       2              1 2_1
#> 5  Protein_12    Peptide_4900      2  Peptide_4900_2       2              1 2_1
#> 6  Protein_12    Peptide_4900      3  Peptide_4900_3       2              1 2_1
#>   Channel   Condition  BioReplicate  Intensity
#> 1    126C Condition_4 Condition_4_1 10996116.9
#> 2    127C Condition_5 Condition_5_1    56965.1
#> 3    131N Condition_2 Condition_2_2   286121.7
#> 4    131N Condition_2 Condition_2_4   534806.0
#> 5    126C Condition_4 Condition_4_3  1134908.7
#> 6    126C Condition_4 Condition_4_3  1605773.2
# raw.protein <- read.csv(file="raw.protein.csv", header=TRUE)

The raw Protein dataset looks similar to the PTM dataset, however the ProteinName column does not contain a modification site.

1.2 proteinSummarization

After loading in the input data, the next step is to use the proteinSummarization function from MSstatsTMT. This provides the summarized dataset needed to model the protein/PTM abundance. The summarization for PTM and Protein datasets should be done separately. The function will summarize the Protein dataset up to the protein level and will summarize the PTM dataset up to the PTM level. The different summarizations are caused by adding the PTM site into the ProteinName field. For details about normalization and imputation options in proteinSummarization please review the package documentation here: MSstatsTMT Package.


# Use proteinSummarization function from MSstatsTMT to summarize raw data
quant.msstats.ptm <- proteinSummarization(raw.ptm,
                                          method = "msstats",
                                          global_norm = TRUE,
                                          reference_norm = FALSE,
                                          MBimpute = TRUE)

quant.msstats.protein <- proteinSummarization(raw.protein,
                                          method = "msstats",
                                          global_norm = TRUE,
                                          reference_norm = FALSE,
                                          MBimpute = TRUE)

head(quant.msstats.ptm)
#>   Run          Protein Abundance Channel  BioReplicate   Condition
#> 1 1_1 Protein_1076_Y67  13.65475    130N Condition_1_2 Condition_1
#> 2 1_1 Protein_1076_Y67  13.57146    127N Condition_1_1 Condition_1
#> 3 1_1 Protein_1076_Y67  13.56900    128N Condition_2_1 Condition_2
#> 4 1_1 Protein_1076_Y67  13.70567    131N Condition_2_2 Condition_2
#> 5 1_1 Protein_1076_Y67  13.24717    131C Condition_3_2 Condition_3
#> 6 1_1 Protein_1076_Y67  13.11874    129N Condition_3_1 Condition_3
#>   TechRepMixture Mixture
#> 1              1       1
#> 2              1       1
#> 3              1       1
#> 4              1       1
#> 5              1       1
#> 6              1       1
head(quant.msstats.protein)
#>   Run      Protein Abundance Channel  BioReplicate   Condition TechRepMixture
#> 1 1_1 Protein_1076  18.75131    127N Condition_1_1 Condition_1              1
#> 2 1_1 Protein_1076  18.80198    130N Condition_1_2 Condition_1              1
#> 3 1_1 Protein_1076  18.92222    131N Condition_2_2 Condition_2              1
#> 4 1_1 Protein_1076  19.02252    128N Condition_2_1 Condition_2              1
#> 5 1_1 Protein_1076  18.28685    131C Condition_3_2 Condition_3              1
#> 6 1_1 Protein_1076  18.40555    129N Condition_3_1 Condition_3              1
#>   Mixture
#> 1       1
#> 2       1
#> 3       1
#> 4       1
#> 5       1
#> 6       1

1.3 groupComparisonTMTPTM

After the two datasets are summarized, both the summarized PTM and protein datasets should be used in the modeling function groupComparisonTMTPTM. First a full pairwise comparison is made between all conditions in the experiment.


# test for all the possible pairs of conditions
model.results.pairwise <- groupComparisonTMTPTM(data.ptm=quant.msstats.ptm,
                                       data.protein=quant.msstats.protein)

Optionally, a specific contrast matrix can be defined. Below is an example of a contrast matrix and how it is passed into the groupComparisonTMTPTM function.

# Specify comparisons
comparison<-matrix(c(1,0,0,-1,0,0,
                     0,1,0,0,-1,0,
                     0,0,-1,0,0,-1,
                     1,0,-1,0,0,0,
                     0,1,-1,0,0,0,
                     0,0,0,1,0,-1,
                     0,0,0,0,1,-1,
                     .25,.25,-.5,.25,.25,-.5,
                     1/3,1/3,1/3,-1/3,-1/3,-1/3),nrow=9, ncol=6, byrow=TRUE)

# Set the names of each row
row.names(comparison)<-c('1-4', '2-5', '3-6', '1-3', 
                         '2-3', '4-6', '5-6', 'Partial', 'Third')
# Set the column names
colnames(comparison)<- c('Condition_1', 'Condition_2', 'Condition_3', 
                         'Condition_4', 'Condition_5', 'Condition_6')

comparison
#>         Condition_1 Condition_2 Condition_3 Condition_4 Condition_5 Condition_6
#> 1-4       1.0000000   0.0000000   0.0000000  -1.0000000   0.0000000   0.0000000
#> 2-5       0.0000000   1.0000000   0.0000000   0.0000000  -1.0000000   0.0000000
#> 3-6       0.0000000   0.0000000  -1.0000000   0.0000000   0.0000000  -1.0000000
#> 1-3       1.0000000   0.0000000  -1.0000000   0.0000000   0.0000000   0.0000000
#> 2-3       0.0000000   1.0000000  -1.0000000   0.0000000   0.0000000   0.0000000
#> 4-6       0.0000000   0.0000000   0.0000000   1.0000000   0.0000000  -1.0000000
#> 5-6       0.0000000   0.0000000   0.0000000   0.0000000   1.0000000  -1.0000000
#> Partial   0.2500000   0.2500000  -0.5000000   0.2500000   0.2500000  -0.5000000
#> Third     0.3333333   0.3333333   0.3333333  -0.3333333  -0.3333333  -0.3333333

# test for specified condition comparisons only
model.results.contrast <- groupComparisonTMTPTM(data.ptm=quant.msstats.ptm,
                                       data.protein=quant.msstats.protein,
                                       contrast.matrix = comparison)

names(model.results.contrast)
#> [1] "PTM.Model"      "Protein.Model"  "Adjusted.Model"
ptm_model <- model.results.contrast[[1]]
protein_model <- model.results.contrast[[2]]
adjusted_model <- model.results.contrast[[3]]

head(adjusted_model)
#> # A tibble: 6 x 8
#>   Protein           Label  log2FC     SE Tvalue    DF   pvalue adj.pvalue
#>   <fct>             <chr>   <dbl>  <dbl>  <dbl> <dbl>    <dbl>      <dbl>
#> 1 Protein_1076_Y67  1-4   -0.0328 0.155  -0.212  6.35 0.839      0.868   
#> 2 Protein_1145_T915 1-4   -1.27   0.248  -5.11  14.8  0.000132   0.000880
#> 3 Protein_1146_S328 1-4    0.0336 0.179   0.188 20.0  0.853      0.872   
#> 4 Protein_1160_S188 1-4   -0.246  0.146  -1.69  17.8  0.108      0.153   
#> 5 Protein_1220_Y321 1-4   -0.330  0.0867 -3.81  24.6  0.000830   0.00369 
#> 6 Protein_1235_S416 1-4   -0.242  0.168  -1.44  12.8  0.174      0.222

The modeling function will return a list consisting of three dataframes.One each for the PTM-level, Protein-level, and adjusted PTM-level group comparison result.

1.4 Example Volcano Plot

The models from the groupComparisonTMTPTM function can be used in the model visualization function, groupComparisonPlots, from the base MSstats. Below is a Volcano Plot for the Adjusted PTM model. Note: the input for groupComparisonPlots should be one data.frame from output of groupComparisonTMTPTM.

groupComparisonPlots(data = adjusted_model,
                     type = 'VolcanoPlot',
                     ProteinName = FALSE,
                     which.Comparison = '1-4',
                     address = FALSE)

MSstatsTMTPTM Example: An example workflow and analysis of the MSstatsTMTPTM package

Devon Kohler (kohler.d@northeastern.edu)

2021-03-19

Installation

1. Workflow

1.1 Raw Data Format

1.1.1 Raw PTM Data

1.1.1 Raw Protein Data

1.2 proteinSummarization

1.3 groupComparisonTMTPTM

1.4 Example Volcano Plot

2. Analysis

2.1 How to adjust PTMs for changes in global protein levels?

2.2 Example PTM

Session information