output:

BiocStyle::html_document

## <script type="text/javascript">
## document.addEventListener("DOMContentLoaded", function() {
##   document.querySelector("h1").style.marginTop = "0";
## });
## </script>
## <script type="text/javascript">
## document.addEventListener("DOMContentLoaded", function() {
##   var links = document.links;  
##   for (var i = 0, linksLength = links.length; i < linksLength; i++)
##     if (links[i].hostname != window.location.hostname)
##       links[i].target = '_blank';
## });
## </script>

Visualisation of proteomics data using R and Bioconductor

Laurent Gatto, Lisa Breckels and Sebastian Gibb

Introduction

References

Relevant packages

There are currently 65 Proteomics and 44 MassSpectrometry packages in Bioconductor version 3.0. Other non-Bioconductor packages are described in the RforProteomics vignette.

Package Title Version
ASEB ASEB Predict Acetylated Lysine Sites 1.10.0
bioassayR bioassayR R library for Bioactivity analysis 1.4.2
BRAIN BRAIN Baffling Recursive Algorithm for Isotope distributioN calculations 1.12.0
CellNOptR CellNOptR Training of boolean logic models of signalling networks using prior knowledge networks and perturbation data. 1.12.0
ChemmineR ChemmineR Cheminformatics Toolkit for R 2.18.0
cisPath cisPath Visualization and management of the protein-protein interaction networks. 1.6.2
cleaver cleaver Cleavage of Polypeptide Sequences 1.4.0
clippda clippda A package for the clinical proteomic profiling data analysis 1.16.0
CNORdt CNORdt Add-on to CellNOptR: Discretized time treatments 1.8.0
CNORfeeder CNORfeeder Integration of CellNOptR to add missing links 1.6.0
CNORode CNORode ODE add-on to CellNOptR 1.8.0
customProDB customProDB Generate customized protein database from NGS data, with a focus on RNA-Seq data, for proteomics search. 1.6.0
deltaGseg deltaGseg deltaGseg 1.6.0
eiR eiR Accelerated similarity searching of small molecules 1.6.0
fmcsR fmcsR Mismatch Tolerant Maximum Common Substructure Searching 1.8.0
GraphPAC GraphPAC Identification of Mutational Clusters in Proteins via a Graph Theoretical Approach. 1.8.0
hpar hpar Human Protein Atlas in R 1.8.0
iPAC iPAC Identification of Protein Amino acid Clustering 1.10.0
IPPD IPPD Isotopic peak pattern deconvolution for Protein Mass Spectrometry by template matching 1.14.0
isobar isobar Analysis and quantitation of isobarically tagged MSMS proteomics data 1.12.0
LPEadj LPEadj A correction of the local pooled error (LPE) method to replace the asymptotic variance adjustment with an unbiased adjustment based on sample size. 1.26.0
MassSpecWavelet MassSpecWavelet Mass spectrum processing by wavelet-based algorithms 1.32.0
MSGFgui MSGFgui A shiny GUI for MSGFplus 1.0.1
MSGFplus MSGFplus An interface between R and MS-GF+ 1.0.3
msmsEDA msmsEDA Exploratory Data Analysis of LC-MS/MS data by spectral counts 1.4.0
msmsTests msmsTests LC-MS/MS Differential Expression Tests 1.4.0
MSnbase MSnbase MSnbase: Base Functions and Classes for MS-based Proteomics 1.14.0
MSnID MSnID Utilities for Exploration and Assessment of Confidence of LC-MSn Proteomics Identifications. 1.0.0
MSstats MSstats Protein Significance Analysis in DDA, SRM and DIA for Label-free or Label-based Proteomics Experiments 2.4.0
mzID mzID An mzIdentML parser for R 1.4.1
mzR mzR parser for netCDF, mzXML, mzData and mzML and mzIdentML files (mass spectrometry data) 2.0.0
PAA PAA PAA (Protein Array Analyzer) 1.0.0
PAnnBuilder PAnnBuilder Protein annotation data package builder 1.30.1
pathview pathview a tool set for pathway based data integration and visualization 1.6.0
Pbase Pbase Manipulating and exploring protein and proteomics data 0.4.0
PCpheno PCpheno Phenotypes and cellular organizational units 1.28.0
pepXMLTab pepXMLTab Parsing pepXML files and filter based on peptide FDR. 1.0.0
plgem plgem Detect differential expression in microarray and proteomics datasets with the Power Law Global Error Model (PLGEM) 1.38.0
PLPE PLPE Local Pooled Error Test for Differential Expression with Paired High-throughput Data 1.26.0
ppiStats ppiStats Protein-Protein Interaction Statistical Package 1.32.0
proBAMr proBAMr Generating SAM file for PSMs in shotgun proteomics data. 1.0.0
PROcess PROcess Ciphergen SELDI-TOF Processing 1.42.0
procoil procoil Prediction of Oligomerization of Coiled Coil Proteins 1.16.0
ProCoNA ProCoNA Protein co-expression network analysis (ProCoNA). 1.4.0
pRoloc pRoloc A unifying bioinformatics framework for spatial proteomics 1.6.0
pRolocGUI pRolocGUI Interactive visualisation of spatial proteomics data 1.0.0
prot2D prot2D Statistical Tools for volume data from 2D Gel Electrophoresis 1.4.0
proteoQC proteoQC An R package for proteomics data quality control 1.2.0
Pviz Pviz Peptide Annotation and Data Visualization using Gviz 1.0.0
qcmetrics qcmetrics A Framework for Quality Control 1.4.0
rain rain Rhythmicity Analysis Incorporating Non-parametric Methods 1.0.0
RCASPAR RCASPAR A package for survival time prediction based on a piecewise baseline hazard Cox regression model. 1.12.0
Rchemcpp Rchemcpp Similarity measures for chemical compounds 2.4.0
Rcpi Rcpi Toolkit for Compound-Protein Interaction in Drug Discovery 1.2.0
RpsiXML RpsiXML R interface to PSI-MI 2.5 files 2.8.0
rpx rpx R Interface to the ProteomeXchange Repository 1.2.0
rTANDEM rTANDEM Interfaces the tandem protein identification algorithm in R 1.6.0
sapFinder sapFinder A package for variant peptides detection and visualization in shotgun proteomics. 1.4.0
ScISI ScISI In Silico Interactome 1.38.0
shinyTANDEM shinyTANDEM Provides a GUI for rTANDEM 1.4.0
SLGI SLGI Synthetic Lethal Genetic Interaction 1.26.0
SpacePAC SpacePAC Identification of Mutational Clusters in 3D Protein Space via Simulation. 1.4.0
specL specL specL - Prepare Peptide Spectrum Matches for Use in Targeted Proteomics 1.0.0
spliceSites spliceSites Manages align gap positions from RNA-seq data 1.4.0
synapter synapter Label-free data analysis pipeline for optimal identification and quantitation 1.8.1
Package Title Version
apComplex apComplex Estimate protein complex membership using AP-MS protein data 2.32.0
BRAIN BRAIN Baffling Recursive Algorithm for Isotope distributioN calculations 1.12.0
CAMERA CAMERA Collection of annotation related methods for mass spectrometry data 1.22.0
cosmiq cosmiq cosmiq - COmbining Single Masses Into Quantities 1.0.0
flagme flagme Analysis of Metabolomics GC/MS Data 1.22.0
gaga gaga GaGa hierarchical model for high-throughput data analysis 2.12.0
iontree iontree Data management and analysis of ion trees from ion-trap mass spectrometry 1.12.0
isobar isobar Analysis and quantitation of isobarically tagged MSMS proteomics data 1.12.0
MAIT MAIT Statistical Analysis of Metabolomic Data 1.0.0
MassArray MassArray Analytical Tools for MassArray Data 1.18.0
MassSpecWavelet MassSpecWavelet Mass spectrum processing by wavelet-based algorithms 1.32.0
Metab Metab Metab: An R Package for a High-Throughput Analysis of Metabolomics Data Generated by GC-MS. 1.0.0
metabomxtr metabomxtr A package to run mixture models for truncated metabolomics data with normal or lognormal distributions. 1.0.0
metaMS metaMS MS-based metabolomics annotation pipeline 1.2.0
MSGFgui MSGFgui A shiny GUI for MSGFplus 1.0.1
MSGFplus MSGFplus An interface between R and MS-GF+ 1.0.3
msmsEDA msmsEDA Exploratory Data Analysis of LC-MS/MS data by spectral counts 1.4.0
msmsTests msmsTests LC-MS/MS Differential Expression Tests 1.4.0
MSnbase MSnbase MSnbase: Base Functions and Classes for MS-based Proteomics 1.14.0
MSnID MSnID Utilities for Exploration and Assessment of Confidence of LC-MSn Proteomics Identifications. 1.0.0
MSstats MSstats Protein Significance Analysis in DDA, SRM and DIA for Label-free or Label-based Proteomics Experiments 2.4.0
mzID mzID An mzIdentML parser for R 1.4.1
mzR mzR parser for netCDF, mzXML, mzData and mzML and mzIdentML files (mass spectrometry data) 2.0.0
PAPi PAPi Predict metabolic pathway activity based on metabolomics data 1.6.0
Pbase Pbase Manipulating and exploring protein and proteomics data 0.4.0
pepXMLTab pepXMLTab Parsing pepXML files and filter based on peptide FDR. 1.0.0
plgem plgem Detect differential expression in microarray and proteomics datasets with the Power Law Global Error Model (PLGEM) 1.38.0
proBAMr proBAMr Generating SAM file for PSMs in shotgun proteomics data. 1.0.0
PROcess PROcess Ciphergen SELDI-TOF Processing 1.42.0
pRoloc pRoloc A unifying bioinformatics framework for spatial proteomics 1.6.0
proteoQC proteoQC An R package for proteomics data quality control 1.2.0
qcmetrics qcmetrics A Framework for Quality Control 1.4.0
Rdisop Rdisop Decomposition of Isotopic Patterns 1.26.0
Risa Risa Converting experimental metadata from ISA-tab into Bioconductor data structures 1.8.0
RMassBank RMassBank Workflow to process tandem MS files and build MassBank records 1.8.1
rols rols An R interface to the Ontology Lookup Service 1.8.0
rpx rpx R Interface to the ProteomeXchange Repository 1.2.0
rTANDEM rTANDEM Interfaces the tandem protein identification algorithm in R 1.6.0
sapFinder sapFinder A package for variant peptides detection and visualization in shotgun proteomics. 1.4.0
shinyTANDEM shinyTANDEM Provides a GUI for rTANDEM 1.4.0
specL specL specL - Prepare Peptide Spectrum Matches for Use in Targeted Proteomics 1.0.0
synapter synapter Label-free data analysis pipeline for optimal identification and quantitation 1.8.1
TargetSearch TargetSearch A package for the analysis of GC-MS metabolite profiling data. 1.22.0
xcms xcms LC/MS and GC/MS Data Analysis 1.42.0

Ascombe's quartet

x1 x2 x3 x4 y1 y2 y3 y4
10 10 10 8 8.04 9.14 7.46 6.58
8 8 8 8 6.95 8.14 6.77 5.76
13 13 13 8 7.58 8.74 12.74 7.71
9 9 9 8 8.81 8.77 7.11 8.84
11 11 11 8 8.33 9.26 7.81 8.47
14 14 14 8 9.96 8.10 8.84 7.04
6 6 6 8 7.24 6.13 6.08 5.25
4 4 4 19 4.26 3.10 5.39 12.50
12 12 12 8 10.84 9.13 8.15 5.56
7 7 7 8 4.82 7.26 6.42 7.91
5 5 5 8 5.68 4.74 5.73 6.89
tab <- matrix(NA, 5, 4)
colnames(tab) <- 1:4
rownames(tab) <- c("var(x)", "mean(x)",
                   "var(y)", "mean(y)",
                   "cor(x,y)")

for (i in 1:4)
    tab[, i] <- c(var(anscombe[, i]),
                  mean(anscombe[, i]),
                  var(anscombe[, i+4]),
                  mean(anscombe[, i+4]),
                  cor(anscombe[, i], anscombe[, i+4]))
1 2 3 4
var(x) 11.0000000 11.0000000 11.0000000 11.0000000
mean(x) 9.0000000 9.0000000 9.0000000 9.0000000
var(y) 4.1272691 4.1276291 4.1226200 4.1232491
mean(y) 7.5009091 7.5009091 7.5000000 7.5009091
cor(x,y) 0.8164205 0.8162365 0.8162867 0.8165214
ff <- y ~ x

par(mfrow = c(2, 2), mar = c(4, 4, 1, 1))
for (i in 1:4) {
    ff[2:3] <- lapply(paste0(c("y","x"), i), as.name)
    plot(ff, data = anscombe, pch = 19, xlim = c(3, 19), ylim = c(3, 13))
    abline(lm(ff, data = anscombe))
}

plot of chunk anscombefig

plot of chunk anscombe

The MA plot example

The following code chunk connects to the PXD000001 data set on the ProteomeXchange repository and fetches the mzTab file. After missing values filtering, we extract relevant data (log2 fold-changes and log10 mean expression intensities) into data.frames.

library("rpx")
px1 <- PXDataset("PXD000001")
mztab <- pxget(px1, "PXD000001_mztab.txt")
## Downloading 1 file
library("MSnbase")
qnt <- readMzTabData(mztab, what = "PEP")
## Detected a metadata section
## Detected a peptide section
sampleNames(qnt) <- reporterNames(TMT6)
qnt <- filterNA(qnt)
## may be combineFeatuers

spikes <- c("P02769", "P00924", "P62894", "P00489")
protclasses <- as.character(fData(qnt)$accession)
protclasses[!protclasses %in% spikes] <- "Background"


madata42 <- data.frame(A = rowMeans(log(exprs(qnt[, c(4, 2)]), 10)),
                       M = log(exprs(qnt)[, 4], 2) - log(exprs(qnt)[, 2], 2),
                       data = rep("4vs2", nrow(qnt)),
                       protein = fData(qnt)$accession,
                       class = protclasses)

madata62 <- data.frame(A = rowMeans(log(exprs(qnt[, c(6, 2)]), 10)),
                       M = log(exprs(qnt)[, 6], 2) - log(exprs(qnt)[, 2], 2),
                       data = rep("6vs2", nrow(qnt)),
                       protein = fData(qnt)$accession,
                       class = protclasses)


madata <- rbind(madata42, madata62)

The traditional plotting system

par(mfrow = c(1, 2))
plot(M ~ A, data = madata42, main = "4vs2",
     xlab = "A", ylab = "M", col = madata62$class)
plot(M ~ A, data = madata62, main = "6vs2",
     xlab = "A", ylab = "M", col = madata62$class)

plot of chunk mafig1

plot of chunk mafig1pdf

lattice

library("lattice")
latma <- xyplot(M ~ A | data, data = madata,
                groups = madata$class,
                auto.key = TRUE)
print(latma)

plot of chunk mafig2

plot of chunk mafig2pdf

ggplot2

library("ggplot2")
ggma <- ggplot(aes(x = A, y = M, colour = class), data = madata,
               colour = class) +
                   geom_point() +
                       facet_grid(. ~ data)
print(ggma)

plot of chunk mafig3

plot of chunk mafig3pdf

Customization

library("RColorBrewer")
bcols <- brewer.pal(4, "Set1")
cls <- c("Background" = "#12121230",
         "P02769" = bcols[1],
         "P00924" = bcols[2],
         "P62894" = bcols[3],
         "P00489" = bcols[4])
ggma2 <- ggplot(aes(x = A, y = M, colour = class),
                data = madata) + geom_point(shape = 19) +
                    facet_grid(. ~ data) + scale_colour_manual(values = cls) +
                        guides(colour = guide_legend(override.aes = list(alpha = 1)))
print(ggma2)

plot of chunk macust

plot of chunk macustpdf

The MAplot method for MSnSet instances

MAplot(qnt, cex = .8)

plot of chunk mafigmsnset

An interactive shiny app for MA plots

This app is based on Mike Love's shinyMA application, adapted for a proteomics data. A screen shot is displayed below. To start the application:

shinyMA()

shinyMA screeshot

The application is also available online at https://lgatto.shinyapps.io/shinyMA/.

Visualising mass spectrometry data

Direct access to the raw data

library("lattice")
library("mzR")
mzf <- pxget(px1, 6)
## Downloading 1 file
ms <- openMSfile(mzf)

hd <- header(ms)
ms1 <- which(hd$msLevel == 1)

rtsel <- hd$retentionTime[ms1] / 60 > 30 & hd$retentionTime[ms1] / 60 < 35
library("MSnbase")
(M <- MSmap(ms, ms1[rtsel], 521, 523, .005, hd))
## 1
## Object of class "MSmap"
##  Map [75, 401]
##   [1]  Retention time: 30:1 - 34:58 
##   [2]  M/Z: 521 - 523 (res 0.005)
ff <- colorRampPalette(c("yellow", "steelblue"))
trellis.par.set(regions=list(col=ff(100)))
plot(M, aspect = 1, allTicks = FALSE)

plot of chunk mapsheat

plot of chunk mapsheadpdf

M@map[msMap(M) == 0] <- NA
plot3D(M, FALSE)

plot of chunk maps3D

plot of chunk maps3Dpdf

library("rgl")
plot3D(M, TRUE)
lout <- matrix(NA, ncol = 10, nrow = 8)
lout[1:2, ] <- 1
for (ii in 3:4)
    lout[ii, ] <- c(2, 2, 2, 2, 2, 2, 3, 3, 3, 3)
lout[5, ] <- rep(4:8, each = 2)
lout[6, ] <- rep(4:8, each = 2)
lout[7, ] <- rep(9:13, each = 2)
lout[8, ] <- rep(9:13, each = 2)

i <- ms1[which(rtsel)][1]
j <- ms1[which(rtsel)][2]
ms2 <- (i+1):(j-1)

layout(lout)

par(mar=c(4,2,1,1))
chromatogram(ms)
abline(v = hd[i, "retentionTime"], col = "red")


par(mar = c(3, 2, 1, 0))
plot(peaks(ms, i), type = "l", xlim = c(400, 1000))
legend("topright", bty = "n",
       legend = paste0(
           "Acquisition ", hd[i, "acquisitionNum"],  "\n",
           "Retention time ", formatRt(hd[i, "retentionTime"])))
abline(h = 0)
abline(v = hd[ms2, "precursorMZ"],
       col = c("#FF000080",
           rep("#12121280", 9)))

par(mar = c(3, 0.5, 1, 1))
plot(peaks(ms, i), type = "l", xlim = c(521, 522.5),
     yaxt = "n")
abline(h = 0)
abline(v = hd[ms2, "precursorMZ"], col = "#FF000080")

##par(mar = omar)
par(mar = c(2, 2, 0, 1))
for (ii in ms2) {
    p <- peaks(ms, ii)
    plot(p, xlab = "", ylab = "", type = "h", cex.axis = .6)
    legend("topright", legend = paste0("Prec M/Z\n",
                           round(hd[ii, "precursorMZ"], 2)),
           bty = "n", cex = .8)
}

plot of chunk msdetails