Contents

1 Foreword

pRolocGUI is under active development; current functionality is evolving and new features will be added. This software is free and open-source. You are invited to open issues in the Github pRolocGUI repository in case you have any questions, suggestions or have found any bugs or typos. To reach a broader audience for more general questions about proteomics analyses using R consider of writing to the Bioconductor Support Forum.

2 Introduction

This vignette describes the implemented functionality of the pRolocGUI package. The package is based on the MSnSet class definitions of MSnbase and on the functions defined in the pRoloc package. pRolocGUI is intended for, but not limited to, the interactive visualisation and analysis of quantitative spatial proteomics data. To achieve reactivity and interactivity, pRolocGUI relies on the shiny framework. We recommend some familiarity with the MSnSet class (see ?MSnSet for details) and the pRoloc vignette (see vignette("pRoloc-tutorial")) before using pRolocGUI.

There are 3 applications distributed with pRolocGUI which are wrapped and launched by the pRolocVis function. These 3 applications are called according to the argument app in the pRolocVis function which may be one of “main”, “classify” or “compare”.

2.1 Getting started

Once R is started, the first step to enable functionality of the package is to load it, as shown in the code chunk below. We also load the pRolocdata data package, which contains quantitative proteomics datasets.

library("pRolocGUI")
library("pRolocdata")

We begin by loading the dataset hyperLOPIT2015 from the pRolocdata data package. The data was produced from using the hyperLOPIT technology on mouse E14TG2a embryonic stem cells (Christoforou et al 2016). For more background spatial proteomics data anlayses please see Gatto et al 2010, Gatto et al 2014 and also the pRoloc tutorial vignette.

data(hyperLOPIT2015) 

To load one of the applications using the pRolocVis function and view the data you are required to specify a minimum of one key argument, object, which is the data to display and must be of class MSnSet (or a MSnSetList of length 2 for the compare application). Please see vignette("pRoloc-tutorial") or vignette("MSnbase-io") for importing and loading data. The argument app tells the pRolocVis function what type of application to load. One can choose from: "main" (default), "classify", "compare". The optional argument fcol (and fcol1 and fcol2 for the compare app) can be used which allows the user to specify the feature meta-data label(s) (fData column name(s)) to be plotted. The default is markers (i.e. the labelled data) for the PCA and compare For the classification app one must specify the prediction column i.e. the feature meta-data label that corresponds to the column containing the classification results, generated from running a supervised machine learning analysis (see below).

For example, to load the main pRolocVis application:

pRolocVis(object = hyperLOPIT2015, fcol = "markers") 

Launching any of the pRolocVis applications will open a new tab in a separate pop-up window, and then the application can be opened in your default Internet browser if desired, by clicking the ‘open in browser’ button in the top panel of the window.

To stop the applications from running press Esc or Ctrl-C in the console (or use the “STOP” button when using RStudio) and close the browser tab, where pRolocVis is running.

2.2 Which app should I use?

There are 3 different applications, each one designed to address a different specific user requirement.

3 The main application

The main, default, application is characterised by an interactive and searchable Principal Components Analysis (PCA) plot. PCA is an ordinance method that can be used to transform a high-dimensional dataset into a smaller lower-dimenensional set of uncorrelated variables (principal components), such that the first principal component has the largest possible variance to account for as much variability in the data as possible. Each succeeding component in turn has the highest variance possible under the constraint that it be orthogonal to the preceding components. Thus, PCA is particularly useful for visualisation of multidimensional data in 2-dimensions, wherein all the proteins can be plotted on the same figure.

The application is subdivided in to three tabs: (1) PCA, (2) Profiles, and (3) Table Selection. A searchable data table containing the experimental feature meta-data is permanantly dispalyed at the bottom of the screen for ease. You can browse between the tabs by simply clicking on them at the top of the screen.

To run the main application using pRolocVis:

pRolocVis(object = hyperLOPIT2015, fcol = "markers") 

The PCA Tab

Viewing The PCA tab is characterised by its main panel which shows a PCA plot for the selected MSnSet. By default a PCA plot is used to display the data and the first two principal components are plotted. The sidebar panel controls what features to highlight on the PCA plot. Under the ‘Labels’ menu, input can be selected by clicking on and off the data class names, or by typing and searching in the white input box. Selected items can then be deleted, by clicking on the name of the class and pressing the delete button on your keyboard. The PCA plot will then be updated accordingly. Below the select box is a ‘transparancy’ slider bar which controls the opacity of the highlighted data classes and two action buttons ‘Zoom/reset plot’ and ‘Clear selection’, which are described below.

Selecting sub-cellular classes

Searching Below the PCA plot is a searchable data table containing the fetaure meta data (fData). For LOPIT experiments, such as the one used in this example, this may contain protein accession numbers, protein entry names, protein description, the number of quantified peptides per protein, and columns containing sub-cellular localisation information. The data table is limited to displaying 12 columns of information, these are automatically selected from the fData to be the first 6 and last features. To select specific columns in the fData to display in the data table use the fdataInds argument, see ?pRolocVis for more details.One can search for proteins of interest by using the white search box, above the table to the right. Searching is done by partial pattern matching with table elements. Any matches or partial text matches that are found are highlighted in the data table. To select/unselect a protein of interest one can simply click/unclick on the corresponding entry in the table or double click directly on a protein of interest on the interactive PCA plot. If a protein(s) in the table is clicked and selected the row in the table will turn grey and the protein(s) will be highlighted on the PCA plot by a dark grey circle(s), if the ‘Show labels’ box is checked in the left sidebar panel the protein names for the selected protein(s) will also be shown on the PCA plot. Any selected proteins on the PCA plot or in the table can be cleared at any time by clicking the ‘Clear selection’ button on the left hand side panel.

Searching for proteins of interest

Zooming If a user wishes to examine a protein(s) in more detail, one can zoom in on specific points by hovering the mouse over the plot, then clicking and drawing a (square) brush and then clicking the ‘Zoom/reset button’ in the left side panel to zoom to the brushed area. This process can be repeated until the desired level of zoom is reached. The plot can be resetted to the original size by clicking the ‘Zoom/reset button’ once again.

Brushing on the plot Zooming proteins of interest

Profiles By clicking the profiles tab at the top of the page a protein profiles plot is displayed that shows the quantitation data that is stored in the exprs data slot of the MSnSet. For the hyperLOPIT2015 dataset this is the relative abundances of each protein across the 20 fractions (2 x 10-plex replicates). As per the PCA tab, the profiles plot can also be updated according to the input selected in the sidebar panel on the left.

The profiles tab may be useful to specifically look for discrimination between (potentially overlappling) sub-cellular niches. It allows one to do this in an easy and direct manor where all proteins belonging to the same sub-cellular niche/data cluster (as specified by fcol) are loaded together. The protein distribution patterns can then be examined on a group vs group basis. Proteins of interest can be searched in the data table and once clicked, the distribution(s) of selected protein(s) are shown by black lines.

The profiles tab The profiles tab, selecting proteins of interest

Features There is also functionality to use the FeaturesOfInterest/FoICollection infrastructure distributed by the MSnbase package (for examples on how to create FeaturesOfInterest see the pRoloc tutorial)`.