Foreword

synapter is free and open-source software. If you use it, please support the project by citing it in publications:

Nicholas James Bond, Pavel Vyacheslavovich Shliaha, Kathryn S. Lilley, and Laurent Gatto. Improving qualitative and quantitative performance for MS\(^E\)-based label free proteomics. J. Proteome Res., 2013, 12 (6), pp 2340–2353

Questions and bugs

For bugs, typos, suggestions or other questions, please file an issue in our tracking system (https://github.com/lgatto/synapter/issues) providing as much information as possible, a reproducible example and the output of sessionInfo().

If you don’t have a GitHub account or wish to reach a broader audience for general questions about proteomics analysis using R, you may want to use the Bioconductor support site: https://support.bioconductor.org/.

1 Introduction

1.1 Background

The main functionality of synapter is to combine proteomics data acquired under different mass spectrometry settings or with different samples to (i) optimise the respective qualities of the two data sets or (ii) increase the number of identifications, thereby decreasing missing values. Besides synapter offers other functionality inaccessible in the default pipeline, like peptide FDR estimation and filtering on peptide match type and peptide uniqueness.

The example that motivated the development of this package was to combine data obtained on a Synapt G2 instrument:

  1. HDMS\(^E\) data, acquired with additional peptide separation using an ion mobility cell, thus leading to better (both in number and in quality) identification and
  2. standard MS\(^E\) data (acquired without ion mobility separation), providing better data quantitation.

The former is data is called identification peptides and the latter quantitation peptides, irrespective of the acquisition mode (HDMS\(^E\) or MS\(^E\)). This HDMS\(^E\)/MS\(^E\) design is used in this document to illustrate the synapter package.

However, although HDMS\(^E\) mode possesses superior identification and MS\(^E\) mode superior quantitation capabilities and transferring identifications from HDMS\(^E\) to MS\(^E\) is a priori the most efficient setup, identifications can be transferred between any runs, independently of the acquisition mode. This allows to reduce the number of missing values, one of the primary limitation of label-free proteomics. Thus users will benefit from synapter’s functionality even if they run their instruments in a single mode (HDMS\(^E\) or MS\(^E\) only).

However, as will be shown in section Data analysis, transferring identifications from multiple runs to each other increases analysis time and peptide FDR within the analysis. synapter allows to minimise these effects to acceptable degree by choosing runs to transfer identifications from and merging them in the master HDMS\(^E\) file.

This data processing methodology is described in section HDMS\(^E\)/MS\(^E\) data analysis and the analysis pipeline is described in section Different pipelines.

To maximise the benefit of combining better identification and quantitation data, it is also possible to combine several, previously merged identification data files into one master set. This functionality is described in section Using master peptide files.

Finally, section Analysis of complex experiments illustrates a complete pipeline including synapter and MSnbase (Gatto and Lilley 2012) packages to perform protein label-free quantitation: how to combine multiple synapter results to represent the complete experimental design under study and further explore the data, normalise it and perform robust statistical data analysis inside the R environment.

The rationale underlying synapter’s functionality are described in (Shliaha et al. 2013) and (Bond et al. 2013). The first reference describes the benefits of ion mobility separation on identification and the effects on quantitation, that led to the development of synapter, which in described and demonstrated in (Bond et al. 2013).

synapter is written for R~(R Core Team 2012), an open source, cross platform, freely available statistical computing environment and programming language1 https://www.r-project.org/. Functionality available in the R environment can be extended though the usage of packages. Thousands of developers have contributed packages that are distributed via the Comprehensive R Archive Network (CRAN) or through specific initiatives like the Bioconductor2 https://www.bioconductor.org/ project (Gentleman et al. 2004), focusing on the analysis and comprehension of high-throughput biological data.

synapter is such an R package dedicated to the analysis of label-free proteomics data. To obtain detailed information about any function in the package, it is possible to access it’s documentation by preceding it’s name with a question mark at the command line prompt. For example, to obtain information about the synapter package, one would type ?synapter.

1.2 Installation

synapter is available through the Bioconductor project. Details about the package and the installation procedure can be found on its page3 https://bioconductor.org/packages/synapter/. Briefly, installation of the package and all its dependencies should be done using the dedicated Bioconductor infrastructure as shown below:

source("https://bioconductor.org/biocLite.R")
## or, if you have already used the above before
library("BiocInstaller")
## and to install the package
biocLite("synapter")

After installation, synapter will have to be explicitly loaded with

library("synapter")

so that all the package’s functionality is available to the user.

2 Data analysis using synapter

2.1 Preparing the input data

Preparation of the data for synapter requires the .raw data first to be processed with Waters’ ProteinLynx Global Serve (PLGS) software. The PLGS result is then exported as csv spreadsheet files in user specified folders. These csv files can then be used as input for synapter.

We also highly recommend users to acquaint themselves with the PLGS search algorithm for data independent acquisitions (G. Z. Li et al. 2009).

First the user has to specify the output folders for files to be used in synapter analysis as demonstrated in the figures. After the folders are specified ignore the message that appears requiring restarting PLGS.