Protein workflow

Petra Palenikova & Rick Scavetta

2021-05-19

Introduction

Complexome profiling or complexomics is a mass spectrometry-based method used in biology to study macromolecular complexes in their native form. Protein complexes are assessed by evaluating their migration profile across fractions. First, lysed protein sample is separated into fractions, typically by blue native electrophoresis (BN-PAGE) or density gradient centrifugation. Individual fractions are analysed by protein mass spectrometry. This allows to identify protein migration profiles across the fractions and to assess protein co-migration. It is often desirable to compare co-migration between multiple biological samples. Typically, this would require analysing each sample separately, using multiple lanes of blue native gel/multiple density gradients. However, this approach might introduce technical biases making the qualitative comparison of migration profiles as well as quantitative comparison of protein amount between different biological samples difficult. To mitigate these technical biases, biological samples can be labeled by means detectable by mass spectrometry (e.g. SILAC, TMT, iTRAQ) and analysed simultaneously. Stable Isotope Labelling with Amino acids in Cell culture (SILAC) is a method when cells are grown in the presence of amino acids with either low natural abundance “heavy” isotopes of carbon and nitrogen or the most frequent “light” isotopes. This labelling allows for reciprocally labelled samples to be mixed and multiplexed at the very early steps of experiment, making it a useful tool when experimental design requires comparison of 2 biological samples.

Here we present a package to analyse data produced by SILAC complexomics experiments. This package does not interpret raw mass spectrometry data. Protein workflow of this package uses normalised protein data as an input. It allows to performs cluster analysis and contains tools for visualization of results. The analysis is indented for samples that were SILAC labelled, therefore the input file should contain both “heavy” and “light” proteins.

Method description

0) Re-formatting of input file

Input file format was designed so it is easily readable by human. Downstream analysis requires a slightly different format so it is necessary to perfomr this change before performing the analysis.

1) Hierarchical clustering

Clustering allows to identify similarity between migration profiles of proteins in an unbiased way. We can check co-migration of known protein complexes by simply filtering the data, however, clustering provides additional information by allowing to identify unknown proteins that show similar migration profile as our proteins of interest.

This package contains functions to perform hierarchical clustering using Pearson correlation (cantered or uncentered) as a distance measure and one of the three linkage methods (single, average or complete).

2) Export files and visualizations

We provide several functions to export intermediate steps of the analysis. Plotting functionality includes:

  • proteinPlot - line plot for a selected protein
  • groupHeatMap - heatmap for a selected group of proteins
  • oneGroupTwoLabelsCoMigration - scatter plot for a selected group of proteins
  • twoGroupsWithinLabelCoMigration - scatter plot for 2 selected groups of proteins
  • makeBarPlotClusterSummary - bar plot showing number of proteins per cluster

Example protein workflow

Read in data, convert to correct format

library(ComPrAn)
inputFile <- system.file("extData", "dataNormProts.txt", package = "ComPrAn")

forAnalysis <- protImportForAnalysis(inputFile)

Visualization of normalised protein data

Have a look at a selected protein (line plot)

Make a heatmap for a selected group of proteins

Co-migration plot of single protein group between label states

Co-migration plot of two protein groups within label state

Cluster analysis

Create components neccessary for clustering: (distance matrix for labeled and unlabeled samples, protein table for both samples)

Assign clusters to data frames

Make bar plots summarizing numbers of proteins per cluster for labeled and unlabeled samples

Create table containing proteins and their assigned clusters

End of file