Package: RforProteomics
Authors: Authors: Laurent Gatto, Lisa Breckels and Sebastian Gibb
Last compiled: Tue Mar 24 00:44:39 2015
Last modified: 2015-03-24 00:39:58

Introduction

This vignette illustrates existing and Bioconductor infrastructure for the visualisation of mass spectrometry and proteomics data. The code details the visualisations presented in

Gatto L, Breckels LM, Naake T, Gibb S. Visualisation of proteomics data using R and Bioconductor. Proteomics. 2015 Feb 18. doi: 10.1002/pmic.201400392. PubMed PMID: 25690415.

References

Relevant packages

There are currently 68 Proteomics and 49 MassSpectrometry packages in Bioconductor version 3.1. Other non-Bioconductor packages are described in the RforProteomics vignette.

Package Title Version
ASEB ASEB Predict Acetylated Lysine Sites 1.11.0
bioassayR bioassayR R library for Bioactivity analysis 1.5.10
BRAIN BRAIN Baffling Recursive Algorithm for Isotope distributioN calculations 1.13.0
Cardinal Cardinal A mass spectrometry imaging toolbox for statistical analysis 0.99.5
CellNOptR CellNOptR Training of boolean logic models of signalling networks using prior knowledge networks and perturbation data. 1.13.0
ChemmineR ChemmineR Cheminformatics Toolkit for R 2.19.0
cisPath cisPath Visualization and management of the protein-protein interaction networks. 1.7.4
cleaver cleaver Cleavage of Polypeptide Sequences 1.5.3
clippda clippda A package for the clinical proteomic profiling data analysis 1.17.0
CNORdt CNORdt Add-on to CellNOptR: Discretized time treatments 1.9.0
CNORfeeder CNORfeeder Integration of CellNOptR to add missing links 1.7.0
CNORode CNORode ODE add-on to CellNOptR 1.9.1
customProDB customProDB Generate customized protein database from NGS data, with a focus on RNA-Seq data, for proteomics search. 1.7.0
deltaGseg deltaGseg deltaGseg 1.7.0
eiR eiR Accelerated similarity searching of small molecules 1.7.2
fmcsR fmcsR Mismatch Tolerant Maximum Common Substructure Searching 1.9.0
GraphPAC GraphPAC Identification of Mutational Clusters in Proteins via a Graph Theoretical Approach. 1.9.0
hpar hpar Human Protein Atlas in R 1.9.1
iPAC iPAC Identification of Protein Amino acid Clustering 1.11.0
IPPD IPPD Isotopic peak pattern deconvolution for Protein Mass Spectrometry by template matching 1.15.0
isobar isobar Analysis and quantitation of isobarically tagged MSMS proteomics data 1.13.2
LPEadj LPEadj A correction of the local pooled error (LPE) method to replace the asymptotic variance adjustment with an unbiased adjustment based on sample size. 1.27.0
MassSpecWavelet MassSpecWavelet Mass spectrum processing by wavelet-based algorithms 1.33.0
MSGFgui MSGFgui A shiny GUI for MSGFplus 1.1.2
MSGFplus MSGFplus An interface between R and MS-GF+ 1.1.3
msmsEDA msmsEDA Exploratory Data Analysis of LC-MS/MS data by spectral counts 1.5.0
msmsTests msmsTests LC-MS/MS Differential Expression Tests 1.5.0
MSnbase MSnbase Base Functions and Classes for MS-based Proteomics 1.15.12
MSnID MSnID Utilities for Exploration and Assessment of Confidence of LC-MSn Proteomics Identifications. 1.1.3
MSstats MSstats Protein Significance Analysis in DDA, SRM and DIA for Label-free or Label-based Proteomics Experiments 2.5.0
mzID mzID An mzIdentML parser for R 1.5.2
mzR mzR parser for netCDF, mzXML, mzData and mzML and mzIdentML files (mass spectrometry data) 2.1.15
PAA PAA PAA (Protein Array Analyzer) 1.1.1
PAnnBuilder PAnnBuilder Protein annotation data package builder 1.31.1
pathview pathview a tool set for pathway based data integration and visualization 1.7.0
Pbase Pbase Manipulating and exploring protein and proteomics data 0.6.12
PCpheno PCpheno Phenotypes and cellular organizational units 1.29.0
pepXMLTab pepXMLTab Parsing pepXML files and filter based on peptide FDR. 1.1.0
plgem plgem Detect differential expression in microarray and proteomics datasets with the Power Law Global Error Model (PLGEM) 1.39.1
PLPE PLPE Local Pooled Error Test for Differential Expression with Paired High-throughput Data 1.27.0
ppiStats ppiStats Protein-Protein Interaction Statistical Package 1.33.0
proBAMr proBAMr Generating SAM file for PSMs in shotgun proteomics data. 1.1.2
PROcess PROcess Ciphergen SELDI-TOF Processing 1.43.0
procoil procoil Prediction of Oligomerization of Coiled Coil Proteins 1.17.0
ProCoNA ProCoNA Protein co-expression network analysis (ProCoNA). 1.5.2
pRoloc pRoloc A unifying bioinformatics framework for spatial proteomics 1.7.5
pRolocGUI pRolocGUI Interactive visualisation of spatial proteomics data 1.1.4
prot2D prot2D Statistical Tools for volume data from 2D Gel Electrophoresis 1.5.0
proteoQC proteoQC An R package for proteomics data quality control 1.3.2
ProtGenerics ProtGenerics S4 generic functions for Bioconductor proteomics infrastructure 0.99.3
Pviz Pviz Peptide Annotation and Data Visualization using Gviz 1.1.1
qcmetrics qcmetrics A Framework for Quality Control 1.5.1
QuartPAC QuartPAC Identification of mutational clusters in protein quaternary structures. 0.99.3
rain rain Rhythmicity Analysis Incorporating Non-parametric Methods 1.1.1
RCASPAR RCASPAR A package for survival time prediction based on a piecewise baseline hazard Cox regression model. 1.13.0
Rchemcpp Rchemcpp Similarity measures for chemical compounds 2.5.0
Rcpi Rcpi Toolkit for Compound-Protein Interaction in Drug Discovery 1.3.0
RpsiXML RpsiXML R interface to PSI-MI 2.5 files 2.9.0
rpx rpx R Interface to the ProteomeXchange Repository 1.3.0
rTANDEM rTANDEM Interfaces the tandem protein identification algorithm in R 1.7.0
sapFinder sapFinder A package for variant peptides detection and visualization in shotgun proteomics. 1.5.10
ScISI ScISI In Silico Interactome 1.39.0
shinyTANDEM shinyTANDEM Provides a GUI for rTANDEM 1.5.0
SLGI SLGI Synthetic Lethal Genetic Interaction 1.27.0
SpacePAC SpacePAC Identification of Mutational Clusters in 3D Protein Space via Simulation. 1.5.0
specL specL specL - Prepare Peptide Spectrum Matches for Use in Targeted Proteomics 1.1.14
spliceSites spliceSites Manages align gap positions from RNA-seq data 1.5.0
synapter synapter Label-free data analysis pipeline for optimal identification and quantitation 1.9.4
Package Title Version
apComplex apComplex Estimate protein complex membership using AP-MS protein data 2.33.0
BRAIN BRAIN Baffling Recursive Algorithm for Isotope distributioN calculations 1.13.0
CAMERA CAMERA Collection of annotation related methods for mass spectrometry data 1.23.2
Cardinal Cardinal A mass spectrometry imaging toolbox for statistical analysis 0.99.5
cosmiq cosmiq cosmiq - COmbining Single Masses Into Quantities 1.1.0
cytofkit cytofkit cytofkit: an integrated analysis pipeline for mass cytometry data 0.99.18
flagme flagme Analysis of Metabolomics GC/MS Data 1.23.1
gaga gaga GaGa hierarchical model for high-throughput data analysis 2.13.0
iontree iontree Data management and analysis of ion trees from ion-trap mass spectrometry 1.13.0
isobar isobar Analysis and quantitation of isobarically tagged MSMS proteomics data 1.13.2
MAIT MAIT Statistical Analysis of Metabolomic Data 1.1.0
MassArray MassArray Analytical Tools for MassArray Data 1.19.0
MassSpecWavelet MassSpecWavelet Mass spectrum processing by wavelet-based algorithms 1.33.0
Metab Metab Metab: An R Package for a High-Throughput Analysis of Metabolomics Data Generated by GC-MS. 1.1.0
metabomxtr metabomxtr A package to run mixture models for truncated metabolomics data with normal or lognormal distributions. 1.1.0
metaMS metaMS MS-based metabolomics annotation pipeline 1.3.5
MSGFgui MSGFgui A shiny GUI for MSGFplus 1.1.2
MSGFplus MSGFplus An interface between R and MS-GF+ 1.1.3
msmsEDA msmsEDA Exploratory Data Analysis of LC-MS/MS data by spectral counts 1.5.0
msmsTests msmsTests LC-MS/MS Differential Expression Tests 1.5.0
MSnbase MSnbase Base Functions and Classes for MS-based Proteomics 1.15.12
MSnID MSnID Utilities for Exploration and Assessment of Confidence of LC-MSn Proteomics Identifications. 1.1.3
MSstats MSstats Protein Significance Analysis in DDA, SRM and DIA for Label-free or Label-based Proteomics Experiments 2.5.0
mzID mzID An mzIdentML parser for R 1.5.2
mzR mzR parser for netCDF, mzXML, mzData and mzML and mzIdentML files (mass spectrometry data) 2.1.15
PAPi PAPi Predict metabolic pathway activity based on metabolomics data 1.7.0
Pbase Pbase Manipulating and exploring protein and proteomics data 0.6.12
pepXMLTab pepXMLTab Parsing pepXML files and filter based on peptide FDR. 1.1.0
plgem plgem Detect differential expression in microarray and proteomics datasets with the Power Law Global Error Model (PLGEM) 1.39.1
proBAMr proBAMr Generating SAM file for PSMs in shotgun proteomics data. 1.1.2
PROcess PROcess Ciphergen SELDI-TOF Processing 1.43.0
pRoloc pRoloc A unifying bioinformatics framework for spatial proteomics 1.7.5
proteoQC proteoQC An R package for proteomics data quality control 1.3.2
ProtGenerics ProtGenerics S4 generic functions for Bioconductor proteomics infrastructure 0.99.3
qcmetrics qcmetrics A Framework for Quality Control 1.5.1
Rdisop Rdisop Decomposition of Isotopic Patterns 1.27.0
Risa Risa Converting experimental metadata from ISA-tab into Bioconductor data structures 1.9.1
RMassBank RMassBank Workflow to process tandem MS files and build MassBank records 1.9.1
rols rols An R interface to the Ontology Lookup Service 1.9.0
rpx rpx R Interface to the ProteomeXchange Repository 1.3.0
rTANDEM rTANDEM Interfaces the tandem protein identification algorithm in R 1.7.0
sapFinder sapFinder A package for variant peptides detection and visualization in shotgun proteomics. 1.5.10
shinyTANDEM shinyTANDEM Provides a GUI for rTANDEM 1.5.0
sidap sidap sidap: an integrated analysis pipeline for mass cytometry data 0.99.9
SIMAT SIMAT GC-SIM-MS data processing and alaysis tool 0.99.3
specL specL specL - Prepare Peptide Spectrum Matches for Use in Targeted Proteomics 1.1.14
synapter synapter Label-free data analysis pipeline for optimal identification and quantitation 1.9.4
TargetSearch TargetSearch A package for the analysis of GC-MS metabolite profiling data. 1.23.0
xcms xcms LC/MS and GC/MS Data Analysis 1.43.3

Ascombe’s quartet

x1 x2 x3 x4 y1 y2 y3 y4
10 10 10 8 8.04 9.14 7.46 6.58
8 8 8 8 6.95 8.14 6.77 5.76
13 13 13 8 7.58 8.74 12.74 7.71
9 9 9 8 8.81 8.77 7.11 8.84
11 11 11 8 8.33 9.26 7.81 8.47
14 14 14 8 9.96 8.10 8.84 7.04
6 6 6 8 7.24 6.13 6.08 5.25
4 4 4 19 4.26 3.10 5.39 12.50
12 12 12 8 10.84 9.13 8.15 5.56
7 7 7 8 4.82 7.26 6.42 7.91
5 5 5 8 5.68 4.74 5.73 6.89
tab <- matrix(NA, 5, 4)
colnames(tab) <- 1:4
rownames(tab) <- c("var(x)", "mean(x)",
                   "var(y)", "mean(y)",
                   "cor(x,y)")

for (i in 1:4)
    tab[, i] <- c(var(anscombe[, i]),
                  mean(anscombe[, i]),
                  var(anscombe[, i+4]),
                  mean(anscombe[, i+4]),
                  cor(anscombe[, i], anscombe[, i+4]))
1 2 3 4
var(x) 11.0000000 11.0000000 11.0000000 11.0000000
mean(x) 9.0000000 9.0000000 9.0000000 9.0000000
var(y) 4.1272691 4.1276291 4.1226200 4.1232491
mean(y) 7.5009091 7.5009091 7.5000000 7.5009091
cor(x,y) 0.8164205 0.8162365 0.8162867 0.8165214

While the residuals of the linear regression clearly indicate fundamental differences in these data, the most simple and straightforward approach is visualisation to highlight the fundamental differences in the datasets.

ff <- y ~ x

mods <- setNames(as.list(1:4), paste0("lm", 1:4))

par(mfrow = c(2, 2), mar = c(4, 4, 1, 1))
for (i in 1:4) {
    ff[2:3] <- lapply(paste0(c("y","x"), i), as.name)
    plot(ff, data = anscombe, pch = 19, xlim = c(3, 19), ylim = c(3, 13))
    mods[[i]] <- lm(ff, data = anscombe)
    abline(mods[[i]])
}

plot of chunk anscombefig

lm1 lm2 lm3 lm4
0.0390000 1.1390909 -0.5397273 -0.421
-0.0508182 1.1390909 -0.2302727 -1.241
-1.9212727 -0.7609091 3.2410909 0.709
1.3090909 1.2690909 -0.3900000 1.839
-0.1710909 0.7590909 -0.6894545 1.469
-0.0413636 -1.9009091 -1.1586364 0.039
1.2393636 0.1290909 0.0791818 -1.751
-0.7404545 -1.9009091 0.3886364 0.000
1.8388182 0.1290909 -0.8491818 -1.441
-1.6807273 0.7590909 -0.0805455 0.909
0.1794545 -0.7609091 0.2289091 -0.111

The MA plot example

The following code chunk connects to the PXD000001 data set on the ProteomeXchange repository and fetches the mzTab file. After missing values filtering, we extract relevant data (log2 fold-changes and log10 mean expression intensities) into data.frames.

library("rpx")
px1 <- PXDataset("PXD000001")
mztab <- pxget(px1, "PXD000001_mztab.txt")
## Downloading 1 file
## PXD000001_mztab.txt already present.
library("MSnbase")
qnt <- readMzTabData(mztab, what = "PEP")
## Detected a metadata section
## Detected a peptide section
sampleNames(qnt) <- reporterNames(TMT6)
qnt <- filterNA(qnt)
## may be combineFeatuers

spikes <- c("P02769", "P00924", "P62894", "P00489")
protclasses <- as.character(fData(qnt)$accession)
protclasses[!protclasses %in% spikes] <- "Background"


madata42 <- data.frame(A = rowMeans(log(exprs(qnt[, c(4, 2)]), 10)),
                       M = log(exprs(qnt)[, 4], 2) - log(exprs(qnt)[, 2], 2),
                       data = rep("4vs2", nrow(qnt)),
                       protein = fData(qnt)$accession,
                       class = protclasses)

madata62 <- data.frame(A = rowMeans(log(exprs(qnt[, c(6, 2)]), 10)),
                       M = log(exprs(qnt)[, 6], 2) - log(exprs(qnt)[, 2], 2),
                       data = rep("6vs2", nrow(qnt)),
                       protein = fData(qnt)$accession,
                       class = protclasses)


madata <- rbind(madata42, madata62)

The traditional plotting system

par(mfrow = c(1, 2))
plot(M ~ A, data = madata42, main = "4vs2",
     xlab = "A", ylab = "M", col = madata62$class)
plot(M ~ A, data = madata62, main = "6vs2",
     xlab = "A", ylab = "M", col = madata62$class)

plot of chunk mafig1

lattice

library("lattice")
latma <- xyplot(M ~ A | data, data = madata,
                groups = madata$class,
                auto.key = TRUE)
print(latma)

plot of chunk mafig2

ggplot2

library("ggplot2")
ggma <- ggplot(aes(x = A, y = M, colour = class), data = madata,
               colour = class) +
                   geom_point() +
                       facet_grid(. ~ data)
print(ggma)

plot of chunk mafig3

Customization

library("RColorBrewer")
bcols <- brewer.pal(4, "Set1")
cls <- c("Background" = "#12121230",
         "P02769" = bcols[1],
         "P00924" = bcols[2],
         "P62894" = bcols[3],
         "P00489" = bcols[4])
ggma2 <- ggplot(aes(x = A, y = M, colour = class),
                data = madata) + geom_point(shape = 19) +
                    facet_grid(. ~ data) + scale_colour_manual(values = cls) +
                        guides(colour = guide_legend(override.aes = list(alpha = 1)))
print(ggma2)

plot of chunk macust

The MAplot method for MSnSet instances

MAplot(qnt, cex = .8)

plot of chunk mafigmsnset

An interactive shiny app for MA plots

This app is based on Mike Love’s shinyMA application, adapted for a proteomics data. A screen shot is displayed below. To start the application:

shinyMA()

shinyMA screeshot

The application is also available online at https://lgatto.shinyapps.io/shinyMA/.

See the excellent shiny web page for tutorials.

Volcano plots

Below, using the msmsTest package, we load a example MSnSet data with spectral couting data (from the r Biocpkg("msmsEDA") package) and run a statistical test to obtain (adjusted) p-values and fold-changes.

library("msmsEDA")
library("msmsTests")
data(msms.dataset)
## Pre-process expression matrix
e <- pp.msms.data(msms.dataset)
## Models and normalizing condition
null.f <- "y~batch"
alt.f <- "y~treat+batch"
div <- apply(exprs(e),2,sum)
## Test
res <- msms.glm.qlll(e,alt.f,null.f,div=div)
lst <- test.results(res,e,pData(e)$treat,"U600","U200",div,
                    alpha=0.05,minSpC=2,minLFC=log2(1.8),
                    method="BH")

Here, we produce the volcano plot by hand, with the plot function. In the second plot, we limit the x axis limits and add grid lines.

plot(lst$tres$LogFC, -log10(lst$tres$p.value))

plot of chunk volc1

plot(lst$tres$LogFC, -log10(lst$tres$p.value),
     xlim = c(-3, 3))
grid()

plot of chunk volc1

Below, we use the res.volcanoplot function from the r Biocpkg("msmsTests") package. This functions uses the sample annotation stored with the quantitative data in the MSnSet object to colour the samples according to their phenotypes.

## Plot
res.volcanoplot(lst$tres,
                max.pval=0.05,
                min.LFC=1,
                maxx=3,
                maxy=NULL,
                ylbls=4)

plot of chunk volc2

A PCA plot

Using the counts.pca function from the msmsEDA package:

library("msmsEDA")
data(msms.dataset)
msnset <- pp.msms.data(msms.dataset)
lst <- counts.pca(msnset, wait=FALSE)

plot of chunk msmsedapca plot of chunk msmsedapca

It is also possible to generate the PCA data using the prcomp. Below, we extract the coordinates of PC1 and PC2 from the counts.pca result and plot them using the plot function.

pcadata <- lst$pca$x[, 1:2]
head(pcadata)
##                  PC1       PC2
## U2.2502.1 -120.26080 -53.55270
## U2.2502.2  -99.90618 -53.89979
## U2.2502.3 -127.35928 -49.29906
## U2.2502.4 -166.04611 -39.27557
## U6.2502.1 -127.18423  37.11614
## U6.2502.2 -117.97016  47.03702
plot(pcadata[, 1], pcadata[, 2],
     xlab = "PCA1", ylab = "PCA2")
grid()

plot of chunk pca

Plotting with R

kable(plotfuns)
plot type traditional lattice ggplot2
scatterplots plot xyplot geom_point
histograms hist histgram geom_histogram
density plots plot(density()) densityplot geom_density
boxplots boxplot bwplot geom_boxplot
violin plots vioplot::vioplot bwplot(…, panel = panel.violin) geom_violin
line plots plot, matplot xyploy, parallelplot geom_line
bar plots barplot barchart geom_bar
pie charts pie geom_bar with polar coordinates
dot plots dotchart dotplot geom_point
stip plots stripchart stripplot goem_point
dendrogramms plot(hclust()) latticeExtra package ggdendro package
heatmaps image, heatmap levelplot geom_tile

Below, we are going to use a data from the r Biocexptpkg("pRolocdata") to illustrate the plotting functions.

library("pRolocdata")
data(tan2009r1)

Scatter plots

See the MA and volcano plot examples.

The default plot type is p, for points. Other important types are l for lines and h for histogram (see below).

Historams and density plots

We extract the (normalised) intensities of the first sample

x <- exprs(tan2009r1)[, 1]

and plot the distribution with a histogram and a density plot next to each other on the same figure (using the mfrow par plotting paramter)

par(mfrow = c(1, 2))
hist(x)
plot(density(x))

plot of chunk histplot

box plots and violin plots

we first extract the 888 proteins by r ncol(tan2009r1) samples data matrix and plot the sample distributions next to each other using boxplot and beanplot (from the beanplot package).

library("beanplot")
x <- exprs(tan2009r1)
par(mfrow = c(2, 1))
boxplot(x)
beanplot(x[, 1], x[, 2], x[, 3], x[, 4], log = "")

plot of chunk bxplot

line plots

below, we produce line plots that describe the protein quantitative profiles for two sets of proteins, namely er and mitochondrial proteins using matplot.

we need to transpose the matrix (with t) and set the type to both (b), to display points and lines, the colours to red and steel blue, the point characters to 1 (an empty point) and the line type to 1 (a solid line).

er <- fData(tan2009r1)$markers == "ER"
mt <- fData(tan2009r1)$markers == "mitochondrion"

par(mfrow = c(2, 1))
matplot(t(x[er, ]), type = "b", col = "red", pch = 1, lty = 1)
matplot(t(x[mt, ]), type = "b", col = "steelblue", pch = 1, lty = 1)

plot of chunk matplotex

In the last section, about spatial proteomics, we use the specialised plotDist function from the pRoloc package to generate such figures.

Bar and dot charts

To illustrate bar and dot charts, we cound the number of proteins in the respective class using table.

x <- table(fData(tan2009r1)$markers)
x
## 
##  Cytoskeleton            ER         Golgi      Lysosome mitochondrion 
##             7            28            13             8            29 
##       Nucleus    Peroxisome            PM    Proteasome  Ribosome 40S 
##            21             4            34            15            20 
##  Ribosome 60S       unknown 
##            32           677
par(mfrow = c(1, 2))
barplot(x)
dotchart(x)
## Warning in dotchart(x): 'x' is neither a vector nor a matrix: using
## as.numeric(x)

plot of chunk mrkplot

Heatmaps

The easiest to produce a complete heatmap is with the heatmap function:

heatmap(exprs(tan2009r1))

plot of chunk hmap

To produce the a heatmap without the dendrograms, one can use the image function on a matrix or the specialised version for MSnSet objects from the MSnbase package.

par(mfrow = c(1, 2))
x <- matrix(1:9, ncol = 3)
image(x)
image(tan2009r1)

plot of chunk image

See also gplots’s heatmap.2 function and the Heatplus Bioconductor package for more advanced heatmaps and the corrplot package for correlation matrices.

Dendrograms

The easiest way to produce and plot a dendrogram is:

d <- dist(t(exprs(tan2009r1))) ## distance between samples
hc <- hclust(d) ## hierarchical clustering
plot(hc) ## visualisation

plot of chunk dendro

See also dendextend and this post to illustrate latticeExtra and ggdendro.

Venn diagrams

Visualising mass spectrometry data

Direct access to the raw data

library("mzR")
mzf <- pxget(px1, 6)
## Downloading 1 file
## TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML already present.
ms <- openMSfile(mzf)

hd <- header(ms)
ms1 <- which(hd$msLevel == 1)

rtsel <- hd$retentionTime[ms1] / 60 > 30 & hd$retentionTime[ms1] / 60 < 35
library("MSnbase")
(M <- MSmap(ms, ms1[rtsel], 521, 523, .005, hd))
## 1
## Object of class "MSmap"
##  Map [75, 401]
##   [1]  Retention time: 30:1 - 34:58 
##   [2]  M/Z: 521 - 523 (res 0.005)
library("lattice")
ff <- colorRampPalette(c("yellow", "steelblue"))
trellis.par.set(regions=list(col=ff(100)))
plot(M, aspect = 1, allTicks = FALSE)

plot of chunk mapsheat

M@map[msMap(M) == 0] <- NA
plot3D(M, rgl = FALSE)

plot of chunk maps3D

To produce a version that can be reoriented interactively on the screen discplay, use the rgl

library("rgl")
plot3D(M, rgl = TRUE)
lout <- matrix(NA, ncol = 10, nrow = 8)
lout[1:2, ] <- 1
for (ii in 3:4)
    lout[ii, ] <- c(2, 2, 2, 2, 2, 2, 3, 3, 3, 3)
lout[5, ] <- rep(4:8, each = 2)
lout[6, ] <- rep(4:8, each = 2)
lout[7, ] <- rep(9:13, each = 2)
lout[8, ] <- rep(9:13, each = 2)

i <- ms1[which(rtsel)][1]
j <- ms1[which(rtsel)][2]
ms2 <- (i+1):(j-1)

layout(lout)

par(mar=c(4,2,1,1))
chromatogram(ms)
abline(v = hd[i, "retentionTime"], col = "red")


par(mar = c(3, 2, 1, 0))
plot(peaks(ms, i), type = "l", xlim = c(400, 1000))
legend("topright", bty = "n",
       legend = paste0(
           "Acquisition ", hd[i, "acquisitionNum"],  "\n",
           "Retention time ", formatRt(hd[i, "retentionTime"])))
abline(h = 0)
abline(v = hd[ms2, "precursorMZ"],
       col = c("#FF000080",
           rep("#12121280", 9)))

par(mar = c(3, 0.5, 1, 1))
plot(peaks(ms, i), type = "l", xlim = c(521, 522.5),
     yaxt = "n")
abline(h = 0)
abline(v = hd[ms2, "precursorMZ"], col = "#FF000080")

##par(mar = omar)
par(mar = c(2, 2, 0, 1))
for (ii in ms2) {
    p <- peaks(ms, ii)
    plot(p, xlab = "", ylab = "", type = "h", cex.axis = .6)
    legend("topright", legend = paste0("Prec M/Z\n",
                           round(hd[ii, "precursorMZ"], 2)),
           bty = "n", cex = .8)
}

plot of chunk msdetails

M2 <- MSmap(ms, i:j, 100, 1000, 1, hd)
## 1
plot3D(M2)

plot of chunk maps3D2

MS barcoding

par(mar=c(4,1,1,1))
image(t(matrix(hd$msLevel, 1, nrow(hd))),
      xlab="Retention time",
      xaxt="n", yaxt="n", col=c("black","steelblue"))
k <- round(range(hd$retentionTime) / 60)
nk <- 5
axis(side=1, at=seq(0,1,1/nk), labels=seq(k[1],k[2],k[2]/nk))

plot of chunk barcode

Animation

The following animation scrolls over 5 minutes of retention time for a MZ range between 521 and 523.

library("animation")
an1 <- function() {
    for (i in seq(0, 5, 0.2)) {
        rtsel <- hd$retentionTime[ms1] / 60 > (30 + i) &
            hd$retentionTime[ms1] / 60 < (35 + i)
        M <- MSmap(ms, ms1[rtsel], 521, 523, .005, hd)
        M@map[msMap(M) == 0] <- NA
        print(plot3D(M, rgl = FALSE))
    }
}

saveGIF(an1(), movie.name = "msanim1.gif")

MS animation 1

The code chunk below scrolls of a slice of retention times while keeping the retention time constant between 30 and 35 minutes.

an2 <- function() {
    for (i in seq(0, 2.5, 0.1)) {
        rtsel <- hd$retentionTime[ms1] / 60 > 30 & hd$retentionTime[ms1] / 60 < 35
        mz1 <- 520 + i
        mz2 <- 522 + i
        M <- MSmap(ms, ms1[rtsel], mz1, mz2, .005, hd)
        M@map[msMap(M) == 0] <- NA
        print(plot3D(M, rgl = FALSE))
    }
}

saveGIF(an2(), movie.name = "msanim2.gif")

MS animation 2

The MSnbase infrastructure

library("MSnbase")
data(itraqdata)
itraqdata2 <- pickPeaks(itraqdata, verbose = FALSE)
plot(itraqdata[[25]], full=TRUE, reporters = iTRAQ4)

plot of chunk msnbviz

par(oma = c(0, 0, 0, 0))
par(mar = c(4, 4, 1, 1))
plot(itraqdata2[[25]], itraqdata2[[28]], sequences = rep("IMIDLDGTENK", 2))

plot of chunk msnbviz

The protViz package

library("protViz")
data(msms)

fi <- fragmentIon("TAFDEAIAELDTLNEESYK")
fi.cyz <- as.data.frame(cbind(c=fi[[1]]$c, y=fi[[1]]$y, z=fi[[1]]$z))
     
p <- peakplot("TAFDEAIAELDTLNEESYK",
              spec = msms[[1]],
              fi = fi.cyz,
              itol = 0.6,
              ion.axes = FALSE)

plot of chunk protviz

The peakplot function return the annotation of the MSMS spectrum that is plotted:

str(p)
## List of 7
##  $ mZ.Da.error : num [1:57] 215.3 144.27 -2.8 -17.06 2.03 ...
##  $ mZ.ppm.error: num [1:57] 1808046 758830 -8306 -37724 3501 ...
##  $ idx         : num [1:57] 1 1 1 3 16 24 41 52 67 88 ...
##  $ label       : chr [1:57] "c1" "c2" "c3" "c4" ...
##  $ score       : num -1
##  $ sequence    : chr "TAFDEAIAELDTLNEESYK"
##  $ fragmentIon :'data.frame':    19 obs. of  3 variables:
##   ..$ c: num [1:19] 119 190 337 452 581 ...
##   ..$ y: num [1:19] 147 310 397 526 655 ...
##   ..$ z: num [1:19] 130 293 380 509 638 ...

Preprocessing of MALDI-MS spectra

The following code chunks demonstrate the usage of the mass spectrometry preprocessing and plotting routines in the r CRANpkg("MALDIquant") package. MALDIquant uses the traditional graphics system. Therefore MALDIquant overloads the traditional functions plot, lines and points for its own data types. These data types represents spectrum and peak lists as S4 classes. Please see the MALDIquant vignette and the corresponding website for more details.

After loading some example data a simple plot draws the raw spectrum.

library("MALDIquant")

data("fiedler2009subset", package="MALDIquant")

plot(fiedler2009subset[[14]])

plot of chunk mqraw

After some preprocessing, namely variance stabilization and smoothing, we use lines to draw our baseline estimate in our processed spectrum.

transformedSpectra <- transformIntensity(fiedler2009subset, method = "sqrt")
smoothedSpectra <- smoothIntensity(transformedSpectra, method = "SavitzkyGolay")

plot(smoothedSpectra[[14]])
lines(estimateBaseline(smoothedSpectra[[14]]), lwd = 2, col = "red")

plot of chunk mqestimatebaseline

After removing the background removal we could use plot again to draw our baseline corrected spectrum.

rbSpectra <- removeBaseline(smoothedSpectra)
plot(rbSpectra[[14]])

plot of chunk mqremovebaseline

detectPeaks returns a MassPeaks object that offers the same traditional graphics functions. The next code chunk demonstrates how to mark the detected peaks in a spectrum.

cbSpectra <- calibrateIntensity(rbSpectra, method = "TIC")
peaks <- detectPeaks(cbSpectra, SNR = 5)

plot(cbSpectra[[14]])
points(peaks[[14]], col = "red", pch = 4, lwd = 2)

plot of chunk mqpeaks

Additional there is a special function labelPeaks that allows to draw the M/Z values above the corresponding peaks. Next we mark the 5 top peaks in the spectrum.

top5 <- intensity(peaks[[14]]) %in% sort(intensity(peaks[[14]]),
                                         decreasing = TRUE)[1:5]
labelPeaks(peaks[[14]], index = top5, avoidOverlap = TRUE)

plot of chunk mqlabelpeaks

Often multiple spectra have to be recalibrated to be comparable. Therefore MALDIquant warps the spectra according to so called reference or landmark peaks. For debugging the determineWarpingFunctions function offers some warping plots. Here we show only the last 4 plots:

par(mfrow = c(2, 2))
warpingFunctions <-
    determineWarpingFunctions(peaks,
                              tolerance = 0.001,
                              plot = TRUE,
                              plotInteractive = TRUE)

plot of chunk mqwarp

par(mfrow = c(1, 1))
warpedSpectra <- warpMassSpectra(cbSpectra, warpingFunctions)
warpedPeaks <- warpMassPeaks(peaks, warpingFunctions)

In the next code chunk we visualise the need and the effect of the recalibration.

sel <- c(2, 10, 14, 16)
xlim <- c(4180, 4240)
ylim <- c(0, 1.9e-3)
lty <- c(1, 4, 2, 6)

par(mfrow = c(1, 2))
plot(cbSpectra[[1]], xlim = xlim, ylim = ylim, type = "n")

for (i in seq(along = sel)) {
  lines(peaks[[sel[i]]], lty = lty[i], col = i)
  lines(cbSpectra[[sel[i]]], lty = lty[i], col = i)
}

plot(cbSpectra[[1]], xlim = xlim, ylim = ylim, type = "n")

for (i in seq(along = sel)) {
  lines(warpedPeaks[[sel[i]]], lty = lty[i], col = i)
  lines(warpedSpectra[[sel[i]]], lty = lty[i], col = i)
}

plot of chunk mqwarped

par(mfrow = c(1, 1))

The code chunks above generate plots that are very similar to the figure 7 in the corresponding paper “Visualisation of proteomics data using R”. Please find the code to exactly reproduce the figure at: https://github.com/sgibb/MALDIquantExamples/blob/master/R/createFigure1_color.R

Genomic and protein sequences

These visualisations originate from the Pbase Pbase-data and mapping vignettes.

Imaging mass spectrometry

The following code chunk downloads a MALDI imaging dataset from a mouse kidney shared by Adrien Nyakas and Stefan Schurch and generates a plot with the mean spectrum and three slices of interesting M/Z regions.

library("MALDIquant")
library("MALDIquantForeign")

spectra <- importBrukerFlex("http://files.figshare.com/1106682/MouseKidney_IMS_testdata.zip", verbose = FALSE)

spectra <- smoothIntensity(spectra, "SavitzkyGolay",  halfWindowSize = 8)
spectra <- removeBaseline(spectra, method = "TopHat", halfWindowSize = 16)
spectra <- calibrateIntensity(spectra, method = "TIC")
avgSpectrum <- averageMassSpectra(spectra)
avgPeaks <- detectPeaks(avgSpectrum, SNR = 5)

avgPeaks <- avgPeaks[intensity(avgPeaks) > 0.0015]

oldPar <- par(no.readonly = TRUE)
layout(matrix(c(1,1,1,2,3,4), nrow = 2, byrow = TRUE))
plot(avgSpectrum, main = "mean spectrum",
     xlim = c(3000, 6000), ylim = c(0, 0.007))
lines(avgPeaks, col = "red")
labelPeaks(avgPeaks, cex = 1)

par(mar = c(0.5, 0.5, 1.5, 0.5))
for (i in seq(along = avgPeaks)) {
  range <- mass(avgPeaks)[i] + c(-1, 1)
  plotImsSlice(spectra, range = range,
               main = paste(round(range, 2), collapse = " - "))
}
par(oldPar)

ims-shiny screeshot

An interactive shiny app for Imaging mass spectrometry

There is also an interactive MALDIquant IMS shiny app for demonstration purposes. A screen shot is displayed below. To start the application:

library("shiny")
runGitHub("sgibb/ims-shiny")

ims-shiny screeshot

Spatial proteomics

library("pRoloc")
library("pRolocdata")

data(tan2009r1)

## these params use class weights
fn <- dir(system.file("extdata", package = "pRoloc"),
          full.names = TRUE, pattern = "params2.rda")
load(fn)

setStockcol(NULL)
setStockcol(paste0(getStockcol(), 90))

w <- table(fData(tan2009r1)[, "pd.markers"])
(w <- 1/w[names(w) != "unknown"])
## 
##  Cytoskeleton            ER         Golgi      Lysosome mitochondrion 
##    0.14285714    0.05000000    0.16666667    0.12500000    0.07142857 
##       Nucleus    Peroxisome            PM    Proteasome  Ribosome 40S 
##    0.05000000    0.25000000    0.06666667    0.09090909    0.07142857 
##  Ribosome 60S 
##    0.04000000
tan2009r1 <- svmClassification(tan2009r1, params2,
                               class.weights = w,
                               fcol = "pd.markers")
ptsze <- exp(fData(tan2009r1)$svm.scores) - 1
lout <- matrix(c(1:4, rep(5, 4)), ncol = 4, nrow = 2)
layout(lout)
cls <- getStockcol()
par(mar = c(4, 4, 1, 1))
plotDist(tan2009r1[which(fData(tan2009r1)$PLSDA == "mitochondrion"), ],
         markers = featureNames(tan2009r1)[which(fData(tan2009r1)$markers.orig == "mitochondrion")],
         mcol = cls[5])
legend("topright", legend = "mitochondrion", bty = "n")
plotDist(tan2009r1[which(fData(tan2009r1)$PLSDA == "ER/Golgi"), ],
         markers = featureNames(tan2009r1)[which(fData(tan2009r1)$markers.orig == "ER")],
         mcol = cls[2])
legend("topright", legend = "ER", bty = "n")
plotDist(tan2009r1[which(fData(tan2009r1)$PLSDA == "ER/Golgi"), ],
         markers = featureNames(tan2009r1)[which(fData(tan2009r1)$markers.orig == "Golgi")],
         mcol = cls[3])
legend("topright", legend = "Golgi", bty = "n")
plotDist(tan2009r1[which(fData(tan2009r1)$PLSDA == "PM"), ],
         markers = featureNames(tan2009r1)[which(fData(tan2009r1)$markers.orig == "PM")],
         mcol = cls[8])
legend("topright", legend = "PM", bty = "n")
plot2D(tan2009r1, fcol = "svm", cex = ptsze, method = "kpca")
addLegend(tan2009r1, where = "bottomleft", fcol = "svm", bty = "n")

plot of chunk spatplot

See the pRoloc-tutorial vignette (pdf) from the pRoloc package for details about spatial proteomics data analysis and visualisation.

Session information

print(sessionInfo(), locale = FALSE)
## R Under development (unstable) (2015-01-22 r67580)
## Platform: x86_64-unknown-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.2 LTS
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] protViz_0.2.9         beanplot_1.2          ggplot2_1.0.1        
##  [4] lattice_0.20-30       msmsTests_1.5.0       msmsEDA_1.5.0        
##  [7] pRolocdata_1.5.7      pRoloc_1.7.7          MLInterfaces_1.47.0  
## [10] cluster_2.0.1         annotate_1.45.2       XML_3.98-1.1         
## [13] AnnotationDbi_1.29.17 GenomeInfoDb_1.3.14   IRanges_2.1.43       
## [16] S4Vectors_0.5.22      MALDIquantForeign_0.9 MALDIquant_1.11      
## [19] RColorBrewer_1.1-2    xtable_1.7-4          rpx_1.3.0            
## [22] knitr_1.9             BiocInstaller_1.17.6  RforProteomics_1.5.8 
## [25] MSnbase_1.15.13       ProtGenerics_0.99.3   BiocParallel_1.1.18  
## [28] mzR_2.1.14            Rcpp_0.11.5           Biobase_2.27.2       
## [31] BiocGenerics_0.13.7   BiocStyle_1.5.3      
## 
## loaded via a namespace (and not attached):
##  [1] affy_1.45.2                  affyio_1.35.0               
##  [3] base64enc_0.1-2              biocViews_1.35.17           
##  [5] biomaRt_2.23.5               bitops_1.0-6                
##  [7] BradleyTerry2_1.0-6          brglm_0.5-9                 
##  [9] car_2.0-25                   caret_6.0-41                
## [11] Category_2.33.0              caTools_1.17.1              
## [13] class_7.3-12                 codetools_0.2-11            
## [15] colorspace_1.2-6             DBI_0.3.1                   
## [17] digest_0.6.8                 doParallel_1.0.8            
## [19] downloader_0.3               e1071_1.6-4                 
## [21] edgeR_3.9.13                 evaluate_0.5.5              
## [23] FNN_1.1                      foreach_1.4.2               
## [25] formatR_1.0                  futile.logger_1.3.7         
## [27] futile.options_1.0.0         gdata_2.13.3                
## [29] genefilter_1.49.2            gplots_2.16.0               
## [31] graph_1.45.2                 grid_3.2.0                  
## [33] gridSVG_1.4-3                GSEABase_1.29.1             
## [35] gtable_0.1.2                 gtools_3.4.1                
## [37] highr_0.4                    htmltools_0.2.6             
## [39] httpuv_1.3.2                 impute_1.41.0               
## [41] interactiveDisplay_1.5.1     interactiveDisplayBase_1.5.1
## [43] iterators_1.0.7              kernlab_0.9-20              
## [45] KernSmooth_2.23-14           labeling_0.3                
## [47] lambda.r_1.1.6               limma_3.23.11               
## [49] lme4_1.1-7                   lpSolve_5.6.10              
## [51] MASS_7.3-39                  Matrix_1.1-5                
## [53] mclust_4.4                   mgcv_1.8-5                  
## [55] mime_0.2                     minqa_1.2.4                 
## [57] munsell_0.4.2                mvtnorm_1.0-2               
## [59] mzID_1.5.2                   nlme_3.1-120                
## [61] nloptr_1.0.4                 nnet_7.3-9                  
## [63] pbkrtest_0.4-2               pcaMethods_1.57.2           
## [65] pls_2.4-3                    plyr_1.8.1                  
## [67] preprocessCore_1.29.0        proto_0.3-10                
## [69] proxy_0.4-14                 quantreg_5.11               
## [71] qvalue_1.43.0                R6_2.0.1                    
## [73] randomForest_4.6-10          RBGL_1.43.0                 
## [75] RCurl_1.95-4.5               rda_1.0.2-2                 
## [77] readBrukerFlexData_1.8.2     readMzXmlData_2.8           
## [79] reshape2_1.4.1               RJSONIO_1.3-0               
## [81] R.methodsS3_1.7.0            R.oo_1.19.0                 
## [83] rpart_4.1-9                  RSQLite_1.0.0               
## [85] RUnit_0.4.28                 R.utils_2.0.0               
## [87] sampling_2.6                 scales_0.2.4                
## [89] sfsmisc_1.0-27               shiny_0.11.1                
## [91] SparseM_1.6                  splines_3.2.0               
## [93] stringr_0.6.2                survival_2.38-1             
## [95] tools_3.2.0                  vsn_3.35.0                  
## [97] zlibbioc_1.13.2