Foreword

topdownr is free and open-source software.

Questions and bugs

For bugs, typos, suggestions or other questions, please file an issue in our tracking system (https://github.com/sgibb/topdownr/issues) providing as much information as possible, a reproducible example and the output of sessionInfo().

If you don’t have a GitHub account or wish to reach a broader audience for general questions about proteomics analysis using R, you may want to use the Bioconductor support site: https://support.bioconductor.org/.

1 Introduction

1.1 The topdownr Data Generation Workflow

Data Generation Workflow

Data Generation Workflow

2 Installation of Additional Software

2.1 Setup the Thermo Software

To create methods the user will have to install and modify Orbitrap Fusion LUMOS workstation first:

  1. Request OrbitrapFusionLumosWorkstation.exe from Thermo Scientific.
  2. Install the workstation by running OrbitrapFusionLumosWorkstation.exe.
  3. Run C:\Thermo\Instruments\TNG\OrbitrapFusionLumos\2.1\System\programs\TNGInstrumentConfigControl.exe, set Optional Ion Source to ETD and Internal Calibration, click Apply and OK (you could ignore the restart your instrument message).
  4. Replace C:\Thermo\Instruments\TNG\OrbitrapFusionLumos\2.1\System\programs\Thermo.TNG.Calcium.MethodXML.dll by Thermo.TNG.Calcium.MethodXML.dll.

2.2 Setup XMLMethodChanger

XMLMethodChanger is needed to convert the xml methods into .meth files. It could be found at https://github.com/thermofisherlsms/meth-modifications The user has to download and compile it himself (or request it from Thermo Scientific as well).

2.3 Setup Operating System

In order to use XMLMethodChanger the operating system has to use the . (dot) as decimal mark and the , (comma) as digit group separator (one thousand dot two should be formated as 1,000.2).

In Windows 7 the settings are located at Windows Control Panel > Region and Language > Formats. Choose English (USA) here or use the Additional settings button to change it manually.

2.4 Setup ScanHeadsman

After data aquisition topdownr would need the header information from the .raw files. Therefore the ScanHeadsman software is used. It could be downloaded from https://bitbucket.org/caetera/scanheadsman

It requires Microsoft .NET 4.5 or later (it is often preinstalled on a typical modern Windows or could be found in Microsoft’s Download Center, e.g. https://www.microsoft.com/en-us/download/details.aspx?id=30653). Additionally you would need Thermo’s MS File Reader which could be downloaded free of charge (but you have to register) from the Thermo FlexNet website: https://thermo.flexnetoperations.com/

ScanHeadsman was created by Vladimir Gorshkov vgor@bmb.sdu.dk.

3 Creating Methods

Importantly, XMLmethodChanger does not create methods de novo, but modifies pre-existing methods (supplied with XMLMethodChanger) using modifications described in XML files. Thus the whole process of creating user specified methods consists of 2 parts:

  1. Construction of XML files with all possible combination of fragmentation parameters (see topdownr::writeMethodXmls below).
  2. Submitting the constructed XML files together with a template .meth file to XMLmethodChanger.

We choose to use targeted MS2 scans (TMS2) as a way to store the fragmentation parameters. Each TMS2 is stored in a separate experiment. Experiments do not overlap.

Method Editor - Experiment 21

Method Editor - Experiment 21

4 Data preparation with topdownr

Shown below is the process of creating XML files and using them to modify the TMS2IndependentTemplateForTD.meth template file.

library("topdownr")

## MS1 parameters (you could also use and modify
## the output of `defaultMs1Settings()`)
parMS1 <- list(
    FirstMass = 400,
    LastMass = 1600,
    Microscans = 10
)

## MS2 parameters (you could also use and modify
## the output of `defaultMs2Settings()`)
parMS2  <- list(
    OrbitrapResolution = "R120K",
    IsolationWindow = 1,
    MaxITTimeInMS = 200,
    ETDSupplementalActivation = c("ETciD", "EThcD"),
    ActivationType = "ETD",
    Microscans = 40,
    ETDSupplementalActivationEnergy = seq (0, 35, 7),
    ETDReactionTime = c (0, 2.5, 5, 10, 15, 30, 50),
    ETDReagentTarget = c(1e6, 5e6, 1e7),
    AgcTarget = c(1e5, 5e5, 1e6)
)

## Create the XML files for mz == 707
writeMethodXmls(ms1Settings = parMS1,
                ms2Settings = parMS2,
                replications = 1,
                groupBy = "ETDReagentTarget",
                mz = cbind(mass=707.3, z=1),
                massLabeling = TRUE,
                nMs2perMs1 = 1000,
                duration = 0.5,
                randomise = FALSE,
                pattern = "method707_%s.xml")

## Create the XML files for mz == 893
writeMethodXmls(ms1Settings = parMS1,
                ms2Settings = parMS2,
                replications = 1,
                groupBy = "ETDReagentTarget",
                mz = cbind(mass=893.1, z=1),
                massLabeling = TRUE,
                nMs2perMs1 = 1000,
                duration = 0.5,
                randomise = FALSE,
                pattern = "method893_%s.xml")

## Create the XML files for mz == 1211
writeMethodXmls(ms1Settings = parMS1,
                ms2Settings = parMS2,
                replications = 1,
                groupBy = "ETDReagentTarget",
                mz = cbind(mass=1211.7, z=1),
                massLabeling = TRUE,
                nMs2perMs1 = 1000,
                duration = 0.5,
                randomise = FALSE,
                pattern = "method1211_%s.xml")

## Run XMLMethodChanger
runXmlMethodChanger(
    modificationXml=list.files(pattern="^method.*\\.xml$"),
    templateMeth="TMS2IndependentTemplateForTD.meth",
    executable="path\\to\\XmlMethodChanger.exe"
)

5 Data Acquisition

After setting up direct infusion make sure that MS1 spectrum produces expected protein mass after deconvolution by Xtract. Shown below is a deconvoluted MS1 spectrum for myoglobin. The dominant mass corresponds to myoglobin with Met removed.

Xtract myoglobin

Xtract myoglobin

6 Data Preparation

Prior to R analysis of protein fragmentation data we have to convert the .raw files.

6.1 Extracting Header Information

Some of the information (SpectrumId, Ion Injection Time (ms), Orbitrap Resolution, targeted Mz, ETD reaction time, CID activation and HCD activation) is stored in scan headers, while other (ETD reagent target and AGC target) is only available in method table.

You can run ScanHeadsman from the commandline (ScanHeadsman.exe --noMS --methods:CSV) or use the function provided by topdownr:

runScanHeadsman(
    path="path\\to\\raw-files",
    executable="path\\to\\ScanHeadsman.exe"
)

ScanHeadsman will generate a .txt (scan header table) and a .csv (method table) file for each .raw file.

6.2 Convert .raw files into mzML

The spectra have to be charge state deconvoluted with Xtract node in Proteome Discoverer 2.1. The software returns deconvoluted spectra in mzML format.

Proteome Discoverer

Proteome Discoverer

Once a .csv, .txt, and .mzML file for each .raw have been produced we can start the analysis using topdownr. Please see analysis vignette (vignette("analysis", package="topdownr")) for an example.

7 Session Info

sessionInfo()
## R version 3.4.2 (2017-09-28)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.3 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.6-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.6-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] ggplot2_2.2.1       ranger_0.8.0        topdownrdata_0.99.3
##  [4] topdownr_1.0.0      Biostrings_2.46.0   XVector_0.18.0     
##  [7] IRanges_2.12.0      S4Vectors_0.16.0    BiocGenerics_0.24.0
## [10] BiocStyle_2.6.0    
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.13          highr_0.6             BiocInstaller_1.28.0 
##  [4] compiler_3.4.2        plyr_1.8.4            iterators_1.0.8      
##  [7] ProtGenerics_1.10.0   tools_3.4.2           zlibbioc_1.24.0      
## [10] MALDIquant_1.16.4     digest_0.6.12         preprocessCore_1.40.0
## [13] evaluate_0.10.1       tibble_1.3.4          gtable_0.2.0         
## [16] lattice_0.20-35       rlang_0.1.2           Matrix_1.2-11        
## [19] foreach_1.4.3         yaml_2.1.14           stringr_1.2.0        
## [22] knitr_1.17            rprojroot_1.2         grid_3.4.2           
## [25] Biobase_2.38.0        impute_1.52.0         XML_3.98-1.9         
## [28] BiocParallel_1.12.0   rmarkdown_1.6         bookdown_0.5         
## [31] limma_3.34.0          reshape2_1.4.2        mzR_2.12.0           
## [34] magrittr_1.5          pcaMethods_1.70.0     backports_1.1.1      
## [37] scales_0.5.0          codetools_0.2-15      htmltools_0.3.6      
## [40] mzID_1.16.0           MSnbase_2.4.0         colorspace_1.3-2     
## [43] labeling_0.3          affy_1.56.0           stringi_1.1.5        
## [46] doParallel_1.0.11     lazyeval_0.2.1        munsell_0.4.3        
## [49] vsn_3.46.0            affyio_1.48.0

8 References