This R package implements statistical modelling of affinity purification–mass spectrometry (AP-MS) data to compute confidence scores to identify bona fide protein-protein interactions (PPI).
The development version can be installed through github:
devtools::install_github(repo="zqzneptune/SMAD")
library(SMAD)
Comparative Proteomic Analysis Software Suite (CompPASS) is based on spoke model. This algorithm was developed by Dr. Mathew Sowa for defining the human deubiquitinating enzyme interaction landscape (Sowa, Mathew E., et al., 2009). The implementation of this algorithm was inspired by Dr. Sowa’s online tutorial. The output includes Z-score, S-score, D-score and WD-score.
Prepare input data into the dataframe datInput with the following format:
idRun | idBait | idPrey | countPrey |
---|---|---|---|
Unique ID of one AP-MS run | Bait ID | Prey ID | Prey peptide count |
Then run:
CompPASS(datInput)
On the basis of CompPASS, CompPASS-plus computes entropy and normalized WD-score. In its implementation in BioPlex 1.0 (Huttlin, Edward L., et al., 2015) and BioPlex 2.0 (Huttlin, Edward L., et al., 2017), a naive Bayes classifier that learns to distinguish true interacting proteins from non-specific background and false positive identifications was included in the compPASS pipline. This function was optimized from the source code.
Prepare input data into the dataframe datInput with the following format:
idRun | idBait | idPrey | countPrey |
---|---|---|---|
Unique ID of one AP-MS run | Bait ID | Prey ID | Prey peptide count |
Then run:
CompPASSplus(datInput)
HGScore Scoring algorithm based on a hypergeometric distribution error model (Hart et al., 2007) with incorporation of NSAF (Zybailov, Boris, et al., 2006). This algorithm was first introduced to predict the protein complex network of Drosophila melanogaster (Guruharsha, K. G., et al., 2011). This scoring algorithm was based on matrix model. Unlike CompPASS, we need protein length for each prey in the additional column.
Prepare input data into the dataframe datInput with the following format:
idRun | idBait | idPrey | countPrey | lenPrey |
---|---|---|---|---|
Unique ID of one AP-MS run | Bait ID | Prey ID | Prey peptide count | Prey protein length |
Then run:
HG(datInput)