1 Introduction
2 Preparation
3 Data import
4 Data preprocessing and exploration
5 Illustrative example
6 Extend the analysis to all proteins
- 6.1 Start fitting
- 6.2 Check example
7 Compute test statistics
- 7.1 Why we need to estimate the degrees of freedom
- 7.2 How to estimate the degrees of freedom
8 Detect significantly shifted proteins
9 Session info
Bibliography

1 Introduction

This vignette shows how to reproduce the analysis described by Childs, Bach, Franken et al. (2019): Non-parametric analysis of thermal proteome profiles reveals novel drug-binding proteins using the NPARC package.

2 Preparation

Load necessary packages.

library(dplyr)
library(magrittr)
library(ggplot2)
library(broom)
library(knitr)
library(NPARC)

3 Data import

First, we load data from a staurosporine TPP experiment (Savitski et al. 2014). The necessary ETL (extract, transform, load) steps have already been conducted, including download from the supplements of the respective publication and conversion into tidy format.

data("stauro_TPP_data_tidy")

Before applying any further transformations and filters we create a copy of the imported data.

df <- stauro_TPP_data_tidy

Let’s perform a first check of the imported data:

df %>% 
  mutate(compoundConcentration = factor(compoundConcentration), 
         replicate = factor(replicate), 
         dataset = factor(dataset)) %>% 
  summary()

##           dataset         uniqueID          relAbundance     temperature  
##  Staurosporine:307080   Length:307080      Min.   :  0.00   Min.   :40.0  
##                         Class :character   1st Qu.:  0.17   1st Qu.:46.0  
##                         Mode  :character   Median :  0.57   Median :53.5  
##                                            Mean   :  0.58   Mean   :53.5  
##                                            3rd Qu.:  0.95   3rd Qu.:61.0  
##                                            Max.   :394.98   Max.   :67.0  
##                                            NA's   :70990                  
##  compoundConcentration replicate  uniquePeptideMatches
##  0 :153540             1:153540   Min.   :  0.00      
##  20:153540             2:153540   1st Qu.:  2.00      
##                                   Median :  6.00      
##                                   Mean   : 10.36      
##                                   3rd Qu.: 13.00      
##                                   Max.   :351.00      
##                                   NA's   :63400

The displayed data contains the following columns:

dataset: The dataset containing the measurements of several TMT-10 experiments. In each experiment, cells were treated with a vehicle or with the compound in one or two concentrations, and measured at ten different temperatures.
uniqueID: The unique identifier for each protein. In the given dataset, it contains the gene symbol concatenated by IPI id. Example: 15 KDA PROTEIN._IPI00879051, 16 KDA PROTEIN._IPI00903282, 17 KDA PROTEIN._IPI00878120, 17 KDA PROTEIN._IPI00736161, 176 KDA PROTEIN._IPI00894060, 18 KDA PROTEIN._IPI00644570 .
relAbundance: The relative signal intensity of the protein in each experiment, scaled to the intensity at the lowest temperature.
temperature: The temperatures corresponding to each of the ten measurements in a TMT experiment.
compoundConcentration The concentration of the administered compound in \(\mu M\).
replicate: The replicate number in each experimental group. Each pair of vehicle and treatment experiments was conducted in two replicates.
uniquePeptideMatches: The number of unique peptides with which a protein was identified.

4 Data preprocessing and exploration

The imported data contains 307080 rows with entries for 7677 proteins.

First, we remove all abundances that were not found with at least one unique peptide, or for which a missing value was recorded.

df %<>% filter(uniquePeptideMatches >= 1)
df %<>% filter(!is.na(relAbundance))

Next, we ensure that the dataset only contains proteins reproducibly observed with full melting curves in both replicates and treatment groups per dataset. A full melting curve is defined by the presence of measurements at all 10 temperatures for the given experimental group.

# Count full curves per protein
df %<>%
  group_by(dataset, uniqueID) %>%
  mutate(n = n()) %>%
  group_by(dataset) %>%
  mutate(max_n = max(n)) %>% 
  ungroup

table(distinct(df, uniqueID, n)$n)

## 
##   10   20   30   40 
##  992  809  993 4505

We see that the majority of proteins contain 40 measurements. This corresponds to two full replicate curves per experimental group. We will focus on these in the current analysis.

# Filter for full curves per protein:
df %<>% 
  filter(n == max_n) %>%
  dplyr::select(-n, -max_n)

The final data contains 180200 rows with entries for 4505 proteins. This number coincides with the value reported in Table 1 of the corresponding publication (Childs et al. 2019).

5 Illustrative example

We first illustrate the principles of nonparametric analysis of response curves (NPARC) on an example protein (STK4) from the staurosporine dataset. The same protein is shown in Figures 1 and 2 of the paper.

5.1 Select data

We first select all entries belonging to the desired protein and dataset:

stk4 <- filter(df, uniqueID == "STK4_IPI00011488")

The table stk4 has 40 rows with measurements of four experimental groups. They consist of two treatment groups (vehicle: \(0~\mu M\) staurosporine, treatment: \(20~\mu M\) staurosporine) with two replicates each. Let us look at the treatment group of replicate 1 for an example:

stk4 %>% filter(compoundConcentration == 20, replicate == 1) %>% 
  dplyr::select(-dataset) %>% kable(digits = 2)

uniqueID	relAbundance	temperature	compoundConcentration	replicate	uniquePeptideMatches
STK4_IPI00011488	1.00	40	20	1	8
STK4_IPI00011488	1.03	43	20	1	8
STK4_IPI00011488	1.06	46	20	1	8
STK4_IPI00011488	1.03	49	20	1	8
STK4_IPI00011488	0.92	52	20	1	8
STK4_IPI00011488	0.93	55	20	1	8
STK4_IPI00011488	0.78	58	20	1	8
STK4_IPI00011488	0.44	61	20	1	8
STK4_IPI00011488	0.21	64	20	1	8
STK4_IPI00011488	0.12	67	20	1	8

To obtain a first impression of the measurements in each experimental group, we generate a plot of the measurements:

stk4_plot_orig <- ggplot(stk4, aes(x = temperature, y = relAbundance)) +
  geom_point(aes(shape = factor(replicate), color = factor(compoundConcentration)), size = 2) +
  theme_bw() +
  ggtitle("STK4") +
  scale_color_manual("staurosporine (mu M)", values = c("#808080", "#da7f2d")) +
  scale_shape_manual("replicate", values = c(19, 17))

print(stk4_plot_orig)

We will show how to add the fitted curves to this plot in the following steps.

5.2 Define function for model fitting

To assess whether there is a significant difference between both treatment groups, we will fit a null model and an alternative models to the data. The null model fits a sigmoid melting curve through all data points irrespective of experimental condition. The alternative model fits separate melting curves per experimental group .

5.3 Fit and plot null models

We use the NPARC package function fitSingleSigmoid to fit the null model:

nullFit <- NPARC:::fitSingleSigmoid(x = stk4$temperature, y = stk4$relAbundance)

The function returns an object of class nls:

summary(nullFit)

## 
## Formula: y ~ (1 - Pl)/(1 + exp((b - a/x))) + Pl
## 
## Parameters:
##    Estimate Std. Error t value Pr(>|t|)   
## Pl   0.0000     0.1795   0.000  1.00000   
## a  692.6739   226.9106   3.053  0.00419 **
## b   12.5048     4.4989   2.780  0.00851 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1814 on 37 degrees of freedom
## 
## Algorithm "port", convergence message: relative convergence (4)

The function augment from the broom package provides a convenient way to obtain the predictions and residuals at each temperature in tabular format. By appending the returned predictions and residuals to our measurements, we ensure that relevant data is collected in the same table and can be added to the plot for visualization. The residuals will be needed later for construction of the test statistic:

nullPredictions <- broom::augment(nullFit)

Let us look at the values returned by augment at two consecutive temperatures. Note that, while the predictions will be the same for each experiment at a given temperature, the residuals will differ because they were computed by comparing the predictions to the actual measurements:

nullPredictions %>% filter(x %in% c(46, 49)) %>% kable()

x	y	.fitted	.resid
46	1.0591140	0.9278000	0.1313139
49	1.0333794	0.8363683	0.1970111
46	0.9449568	0.9278000	0.0171568
49	0.9187253	0.8363683	0.0823571
46	0.8661451	0.9278000	-0.0616550
49	0.7139894	0.8363683	-0.1223788
46	0.8717407	0.9278000	-0.0560594
49	0.7068211	0.8363683	-0.1295471

Now we can append these values to our data frame and show the predicted curve in the plot:

stk4$nullPrediction <- nullPredictions$.fitted
stk4$nullResiduals <- nullPredictions$.resid

stk4_plot <- stk4_plot_orig + geom_line(data = stk4, aes(y = nullPrediction))

print(stk4_plot)

5.4 Fit and plot alternative models

Next we fit the alternative model. Again, we compute the predicted values and the corresponding residuals by the broom::augment() function. To take the compound concentration as a factor into account, we iterate over both concentrations and fit separate models to each subset. We implement this by first grouping the data using the function dplyr::group_by(), and starting the model fitting by dplyr::do().

alternativePredictions <- stk4 %>%
# Fit separate curves per treatment group:
  group_by(compoundConcentration) %>%
  do({
    fit = NPARC:::fitSingleSigmoid(x = .$temperature, y = .$relAbundance, start=c(Pl = 0, a = 550, b = 10))
    broom::augment(fit)
  }) %>%
  ungroup %>%
  # Rename columns for merge to data frame:
  dplyr::rename(alternativePrediction = .fitted,
                alternativeResiduals = .resid,
                temperature = x,
                relAbundance = y)

Add the predicted values and corresponding residuals to our data frame:

stk4 <- stk4 %>%
  left_join(alternativePredictions, 
            by = c("relAbundance", "temperature", 
                   "compoundConcentration")) %>%
  distinct()

## Warning in left_join(., alternativePredictions, by = c("relAbundance", "temperature", : Each row in `x` is expected to match at most 1 row in `y`.
## ℹ Row 1 of `x` matches multiple rows.
## ℹ If multiple matches are expected, set `multiple = "all"` to silence this
##   warning.

Add the curves predicted by the alternative model to the plot. Conceptually, it corresponds to the plot shown in Figures 2 (A)/(B) of the paper.

stk4_plot <- stk4_plot +
  geom_line(data = distinct(stk4, temperature, compoundConcentration, alternativePrediction), 
            aes(y = alternativePrediction, color = factor(compoundConcentration)))

print(stk4_plot)

This plot summarizes Figures 2(A) and 2(B) in the corresponding publication (Childs et al. 2019).

5.5 Compute RSS values

In order to quantify the improvement in goodness-of-fit of the alternative model relative to the null model, we compute the sum of squared residuals (RSS):

rssPerModel <- stk4 %>%
  summarise(rssNull = sum(nullResiduals^2),
            rssAlternative = sum(alternativeResiduals^2))

kable(rssPerModel, digits = 4)

rssNull	rssAlternative
1.2181	0.0831

These values will be used to construct the \(F\)-statistic according to

\[\begin{equation} \label{eq:f_stat} {F} = \frac{{d}_{2}}{{d}_{1}} \cdot \frac{{RSS}^{0} - {RSS}^{1}}{{RSS}^{1}}. \end{equation}\]

To compute this statistic and to derive a p-value, we need the degrees of freedom \({d}_{1}\) and \({d}_{2}\). As described in the paper, they cannot be analytically derived due to the correlated nature of the measurements. The paper describes how to estimate these values from the RSS-values of all proteins in the dataset. In the following Section, we illustrate how to repeat the model fitting for all proteins of a dataset and how to perform hypothesis testing on these models.

6 Extend the analysis to all proteins

This section describes the different steps of the NPARC workflow for model fitting and hyothesis testing. Note that the package also provides a function runNPARC() that performs all of the following steps with one single function call.

6.1 Start fitting

In order to analyze all datasets as described in the paper, we fit null and alternative models to all proteins using the package function NPARCfit:

BPPARAM <- BiocParallel::SerialParam(progressbar = FALSE)

fits <- NPARCfit(x = df$temperature, 
                 y = df$relAbundance, 
                 id = df$uniqueID, 
                 groupsNull = NULL, 
                 groupsAlt = df$compoundConcentration, 
                 BPPARAM = BPPARAM,
                 returnModels = FALSE)

## Starting model fitting...

## ... complete

## Elapsed time: 46.19 secs

## Flagging successful model fits...

## ... complete.

## Evaluating model fits...

## Evaluating models ...

## ... complete.

## Computing model predictions and residuals ...

## ... complete.

## Starting model fitting...

## ... complete

## Elapsed time: 1.51 mins

## Flagging successful model fits...

## ... complete.

## Evaluating model fits...

## Evaluating models ...

## ... complete.

## Computing model predictions and residuals ...

## ... complete.

str(fits, 1)

## List of 2
##  $ predictions: tibble [360,095 × 7] (S3: tbl_df/tbl/data.frame)
##  $ metrics    : tibble [13,515 × 15] (S3: tbl_df/tbl/data.frame)

The returned object fits contains two tables. The table metrics contains the fitted parameters and goodness-of-fit measures for the null and alternative models per protein and group. The table predictions contains the corresponding predicted values and residuals per model.

fits$metrics %>% 
  mutate(modelType = factor(modelType), nCoeffs = factor(nCoeffs), nFitted = factor(nFitted), group = factor((group))) %>% 
  summary

##        modelType         id                  tm                 a           
##  alternative:9010   Length:13515       Min.   :      44   Min.   :   88.34  
##  null       :4505   Class :character   1st Qu.:      50   1st Qu.:  837.27  
##                     Mode  :character   Median :      53   Median : 1103.08  
##                                        Mean   :    6970   Mean   : 1435.08  
##                                        3rd Qu.:      56   3rd Qu.: 1489.48  
##                                        Max.   :88020019   Max.   :15000.00  
##                                        NA's   :787        NA's   :15        
##        b                   pl               aumc          resid_sd       
##  Min.   :  0.00001   Min.   :0.00000   Min.   : 4.87   Min.   :0.007522  
##  1st Qu.: 16.12284   1st Qu.:0.05277   1st Qu.:11.56   1st Qu.:0.035041  
##  Median : 21.44998   Median :0.07971   Median :14.46   Median :0.049796  
##  Mean   : 27.06951   Mean   :0.14059   Mean   :15.03   Mean   :0.064135  
##  3rd Qu.: 28.40480   3rd Qu.:0.14423   3rd Qu.:18.06   3rd Qu.:0.074655  
##  Max.   :250.00000   Max.   :1.50000   Max.   :37.42   Max.   :1.896479  
##  NA's   :15          NA's   :15        NA's   :15      NA's   :15        
##       rss               loglik           tm_sd           nCoeffs     
##  Min.   : 0.00113   Min.   :-70.47   Min.   :0.000e+00   3   :13500  
##  1st Qu.: 0.02921   1st Qu.: 27.32   1st Qu.:0.000e+00   NA's:   15  
##  Median : 0.06361   Median : 37.21   Median :0.000e+00               
##  Mean   : 0.19878   Mean   : 40.00   Mean   :8.723e+12               
##  3rd Qu.: 0.15087   3rd Qu.: 49.53   3rd Qu.:1.000e+00               
##  Max.   :79.39804   Max.   :123.31   Max.   :1.027e+17               
##  NA's   :15         NA's   :15       NA's   :789                     
##  nFitted        conv          group     
##  20  :8996   Mode :logical   0   :4505  
##  40  :4504   FALSE:15        20  :4505  
##  NA's:  15   TRUE :13500     NA's:4505  
##                                         
##                                         
##                                         
##

fits$predictions %>% 
  mutate(modelType = factor(modelType), group = factor((group))) %>% 
  summary

##        modelType           id                  x              y         
##  alternative:179934   Length:360095      Min.   :40.0   Min.   :0.0000  
##  null       :180161   Class :character   1st Qu.:46.0   1st Qu.:0.1563  
##                       Mode  :character   Median :53.5   Median :0.5783  
##                                          Mean   :53.5   Mean   :0.5644  
##                                          3rd Qu.:61.0   3rd Qu.:0.9570  
##                                          Max.   :67.0   Max.   :8.2379  
##                                          NA's   :15     NA's   :15      
##     .fitted            .resid           group       
##  Min.   :0.01099   Min.   :-1.156520   0   : 89948  
##  1st Qu.:0.15960   1st Qu.:-0.025273   20  : 89986  
##  Median :0.58445   Median : 0.000311   NA's:180161  
##  Mean   :0.55997   Mean   : 0.004381                
##  3rd Qu.:0.96269   3rd Qu.: 0.028118                
##  Max.   :1.50000   Max.   : 7.240277                
##  NA's   :15        NA's   :15

6.2 Check example

The results of the STK4 example from earlier can be selected from this object as follows.

First, we check the RSS values of the null and alterantive models:

stk4Metrics <- filter(fits$metrics, id == "STK4_IPI00011488")

rssNull <- filter(stk4Metrics, modelType == "null")$rss
rssAlt <- sum(filter(stk4Metrics, modelType == "alternative")$rss) # Summarize over both experimental groups

rssNull

## [1] 1.218132

rssAlt

## [1] 0.08314745

Next, we plot the predicted curves per model and experimental group:

stk4Predictions <- filter(fits$predictions, modelType == "alternative", id == "STK4_IPI00011488")

stk4_plot_orig +
  geom_line(data = filter(stk4Predictions, modelType == "alternative"), 
            aes(x = x, y = .fitted, color = factor(group))) +
    geom_line(data = filter(stk4Predictions, modelType == "null"), 
            aes(x = x, y = .fitted))

7 Compute test statistics

7.1 Why we need to estimate the degrees of freedom

In order to compute \(F\)-statistics per protein and dataset according to Equation (), we need to know the degrees of freedom of the corresponding null distribution. If we could assume independent and identically distributed (iid) residuals, we could compute them from the number of fitted values and model parameters. In the following, we will show why this simple equation is not appropriate for the curve data we are working with.

First, we compute the test statistics and p-values with theoretical degrees of freedom. These would be true for the case of iid residuals:

modelMetrics <- fits$metrics 
fStats <- NPARCtest(modelMetrics, dfType = "theoretical")

Let us take a look at the computed degrees of freedom:

fStats %>% 
  filter(!is.na(pAdj)) %>%
  distinct(nFittedNull, nFittedAlt, nCoeffsNull, nCoeffsAlt, df1, df2) %>%
  kable()

nFittedNull	nFittedAlt	nCoeffsNull	nCoeffsAlt	df1	df2
40	40	3	6	3	34

We plot the \(F\)-statistics against the theoretical \(F\)-distribution to check how well the null distribution is approximated now:

ggplot(filter(fStats, !is.na(pAdj))) +
  geom_density(aes(x = fStat), fill = "steelblue", alpha = 0.5) +
  geom_line(aes(x = fStat, y = df(fStat, df1 = df1, df2 = df2)), color = "darkred", size = 1.5) +
  theme_bw() +
  # Zoom in to small values to increase resolution for the proteins under H0:
  xlim(c(0, 10))

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: Removed 179 rows containing non-finite values (`stat_density()`).

## Warning: Removed 179 rows containing missing values (`geom_line()`).

The densities of the theoretical \(F\)-distribution (red) do not fit the observed values (blue) very well. While the theoretical distribution tends to overestimate the number of proteins with test statistics smaller than 2.5, it appears to underestimate the amount of proteins with larger values. This would imply that even for highly specific drugs, we observe many more significant differences than we would expect by chance. This hints at an anti-conservative behaviour of our test with the calculated degree of freedom parameters. This is reflected in the p-value distributions. If the distribution assumptions were met, we would expect the null cases to follow a uniform distribution, with a peak on the left for the non-null cases. Instead, we observe a tendency to obtain fewer values than expected in the middle range (around 0.5), but distinct peaks to the left.

ggplot(filter(fStats, !is.na(pAdj))) +
  geom_histogram(aes(x = pVal, y = ..density..), fill = "steelblue", alpha = 0.5, boundary = 0, bins = 30) +
  geom_line(aes(x = pVal, y = dunif(pVal)), color = "darkred", size = 1.5) +
  theme_bw()

## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

7.2 How to estimate the degrees of freedom

In the paper, we describe an alternative way to estimate the degrees of freedom by fitting \(\chi^2\) distributions to the numerator and denominator across all proteins in a dataset. To enable fitting of the distributions, we first need to re-scale the variables by a scaling factor. Because the scaling factors are characteristic for each dataset (it depends on the variances of the residuals in the respective dataset), we estimate them from the data according to:

\[\begin{align} \label{eq:scale-param} \sigma_0^2 &= \frac{1}{2} \frac{V}{M}, \end{align}\]

where \(V\) is the variance of the distribution, and \(M\) is the mean of the distribution.

The following functin estimates \(V\) and \(M\) from the empirical distributions of the RSS differences \(({RSS}^1 - {RSS}^0)\). To increase robustness, it estimate \(M\) and \(V\) by their D-estimates Marazzi (2002) (median and median absolute deviation). It then scales the numerator and denominator of the \(F\)-statistic by these scaling factors and estimate the degree of freedom parameters by fitting unscaled \(\chi^2\) distributions. Finally, it fits the degrees of freedom parameters numerically, computes the test statistics according to Equation () and derives p-values.

modelMetrics <- fits$metrics 
fStats <- NPARCtest(modelMetrics, dfType = "empirical")

We plot the \(F\)-statistics against the theoretical \(F\)-distribution to check how well the null distribution is approximated now:

ggplot(filter(fStats, !is.na(pAdj))) +
  geom_density(aes(x = fStat), fill = "steelblue", alpha = 0.5) +
  geom_line(aes(x = fStat, y = df(fStat, df1 = df1, df2 = df2)), color = "darkred", size = 1.5) +
  theme_bw() +
  # Zoom in to small values to increase resolution for the proteins under H0:
  xlim(c(0, 10))

## Warning: Removed 112 rows containing non-finite values (`stat_density()`).

## Warning: Removed 112 rows containing missing values (`geom_line()`).

Also check the p-value histograms. We expect the null cases to follow a uniform distribution, with a peak on the left for the non-null cases:

ggplot(filter(fStats, !is.na(pAdj))) +
  geom_histogram(aes(x = pVal, y = ..density..), fill = "steelblue", alpha = 0.5, boundary = 0, bins = 30) +
  geom_line(aes(x = pVal, y = dunif(pVal)), color = "darkred", size = 1.5) +
  theme_bw()

The \(F\)-statistics and p-values approximate the expected distributions substantially closer when based on the estimated degrees of freedom than when based on the theoretical degrees of freedom.

8 Detect significantly shifted proteins

Finally, we can select proteins that are significantly shifted by putting a threshold on the Benjamini-Hochberg corrected p-values.

topHits <- fStats %>% 
  filter(pAdj <= 0.01) %>%
  dplyr::select(id, fStat, pVal, pAdj) %>%
  arrange(-fStat)

The table topHits contains 80 proteins with Benjamini-Hochberg corrected p-values \(\leq 0.01\).

Let us look at the targets detected in each dataset. The same proteins are shown in Fig. S3, S4, S6, and S7 of the paper.

knitr::kable(topHits)

id	fStat	pVal	pAdj
CDK5_IPI00023530	369.08369	0.0000000	0.0000000
MAP2K2_IPI00003783	148.23231	0.0000000	0.0000000
CSK_IPI00013212	138.42902	0.0000000	0.0000000
PMPCA_IPI00166749	136.92529	0.0000000	0.0000000
AURKA_IPI00298940	130.93568	0.0000000	0.0000000
FECH_IPI00554589	128.43270	0.0000000	0.0000000
IRAK4_IPI00007641	122.04771	0.0000000	0.0000000
CAMKK2_IPI00290239	116.48860	0.0000000	0.0000000
PAK4_IPI00014068	113.15321	0.0000000	0.0000000
STK4_IPI00011488	102.22111	0.0000000	0.0000000
STK38_IPI00027251	99.52582	0.0000000	0.0000000
PDPK1_IPI00002538	96.75101	0.0000000	0.0000000
GSK3B_IPI00216190	92.87912	0.0000000	0.0000001
ADRBK1_IPI00012497	82.29031	0.0000000	0.0000002
BMP2K_IPI00337426	74.52282	0.0000000	0.0000003
FER_IPI00029263	70.28332	0.0000000	0.0000005
MAP2K1_IPI00219604	62.14727	0.0000000	0.0000012
MAP2K7_IPI00302112	59.66463	0.0000000	0.0000015
MAP4K2_IPI00149094	57.28908	0.0000000	0.0000020
MAPK12_IPI00296283	55.67296	0.0000000	0.0000024
PRKCE_IPI00024539	54.46557	0.0000000	0.0000026
ADK_IPI00290279	54.40451	0.0000000	0.0000026
STK3_IPI00411984	53.21750	0.0000000	0.0000029
MAPK8_IPI00220306	50.04665	0.0000000	0.0000042
MAPKAPK5_IPI00160672	50.02661	0.0000000	0.0000042
PKN1_IPI00412672	49.92801	0.0000000	0.0000042
PTK2_IPI00413961	49.25120	0.0000000	0.0000043
AAK1_IPI00916402	49.24126	0.0000000	0.0000043
CHEK2_IPI00423156	46.45882	0.0000000	0.0000065
CDK2_IPI00031681	45.20472	0.0000001	0.0000077
MAP2K4_IPI00024674	44.20285	0.0000001	0.0000087
CPOX_IPI00093057	44.15898	0.0000001	0.0000087
PHKG2_IPI00012891	42.94143	0.0000001	0.0000104
TNIK_IPI00514275	42.36832	0.0000001	0.0000111
VRK1_IPI00019640	41.29324	0.0000001	0.0000131
MAPKAPK2_IPI00026054	37.93690	0.0000002	0.0000238
MARK2_IPI00555838	37.72279	0.0000002	0.0000242
CAMK2G_IPI00908444	36.11177	0.0000003	0.0000324
GSK3A_IPI00292228	35.97333	0.0000003	0.0000325
PRKAR2B_IPI00554752	35.73921	0.0000003	0.0000332
RIOK2_IPI00306406	35.21429	0.0000003	0.0000361
PRKACA_IPI00396630	33.82740	0.0000004	0.0000464
RPS6KA3_IPI00020898	33.78927	0.0000004	0.0000464
MAPK3_IPI00018195	33.21874	0.0000005	0.0000512
STK24_IPI00872754	32.15069	0.0000006	0.0000633
MARK3_IPI00183118	29.99680	0.0000010	0.0001012
TTK_IPI00151170	29.41243	0.0000012	0.0001137
MKNK1_IPI00304048	25.97641	0.0000028	0.0002628
PDCD10_IPI00298558	25.75835	0.0000030	0.0002727
OSBPL3_IPI00023555	25.48625	0.0000032	0.0002872
HEBP1_IPI00148063	25.07505	0.0000036	0.0003144
RPS6KA1_IPI00477982	24.30014	0.0000044	0.0003811
PIK3CD_IPI00384817	24.02894	0.0000048	0.0004031
MAP3K2_IPI00513803	23.80960	0.0000051	0.0004207
SGK3_IPI00655852	23.29824	0.0000058	0.0004773
PRKCB_IPI00219628	22.96522	0.0000064	0.0005158
IRAK1_IPI00293652	22.88006	0.0000066	0.0005194
CAMK1D_IPI00170508	22.05986	0.0000084	0.0006491
POLR2K_IPI00023975	21.39423	0.0000102	0.0007794
LYN_IPI00298625	20.61441	0.0000130	0.0009746
CDC42BPA_IPI00903296	20.10931	0.0000153	0.0011204
PRKCQ_IPI00029196	20.06833	0.0000155	0.0011204
ALDH6A1_IPI00024990	19.84648	0.0000166	0.0011836
NEDD4L_IPI00304945	19.71039	0.0000173	0.0012173
CCNB2_IPI00028266	19.43361	0.0000190	0.0013111
PRKCD_IPI00329236	17.79046	0.0000330	0.0022380
PRCP_IPI00399307	17.75384	0.0000334	0.0022380
XPOT_IPI00306290	17.43518	0.0000373	0.0024649
MAPK14_IPI00221141	17.28841	0.0000393	0.0025583
CYCS_IPI00465315	16.03390	0.0000620	0.0039760
STK10_IPI00304742	15.90261	0.0000651	0.0041135
FAM96A_IPI00030985	15.86754	0.0000659	0.0041135
DDX54_IPI00152510	14.05105	0.0001340	0.0082433
RPS6KB1_IPI00216132	14.00636	0.0001365	0.0082811
PCTK2_IPI00376955	13.91522	0.0001416	0.0084802
MAP4K5_IPI00294842	13.83201	0.0001465	0.0086587
LLGL1_IPI00105532	13.56799	0.0001634	0.0095307
NQO2_IPI00219129	13.53389	0.0001657	0.0095429
ROCK2_IPI00307155	13.41620	0.0001741	0.0098447
TLK2_IPI00385652	13.39866	0.0001754	0.0098447

9 Session info

devtools::session_info()

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.2.3 (2023-03-15)
##  os       Ubuntu 20.04.6 LTS
##  system   x86_64, linux-gnu
##  ui       X11
##  language (EN)
##  collate  C
##  ctype    en_US.UTF-8
##  tz       America/New_York
##  date     2023-03-20
##  pandoc   2.7.3 @ /usr/bin/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package      * version  date (UTC) lib source
##  backports      1.4.1    2021-12-13 [2] CRAN (R 4.2.3)
##  BiocManager    1.30.20  2023-02-24 [2] CRAN (R 4.2.3)
##  BiocParallel   1.32.6   2023-03-20 [2] Bioconductor
##  BiocStyle    * 2.26.0   2023-03-20 [2] Bioconductor
##  bookdown       0.33     2023-03-06 [2] CRAN (R 4.2.3)
##  broom        * 1.0.4    2023-03-11 [2] CRAN (R 4.2.3)
##  bslib          0.4.2    2022-12-16 [2] CRAN (R 4.2.3)
##  cachem         1.0.7    2023-02-24 [2] CRAN (R 4.2.3)
##  callr          3.7.3    2022-11-02 [2] CRAN (R 4.2.3)
##  cli            3.6.0    2023-01-09 [2] CRAN (R 4.2.3)
##  codetools      0.2-19   2023-02-01 [3] CRAN (R 4.2.3)
##  colorspace     2.1-0    2023-01-23 [2] CRAN (R 4.2.3)
##  crayon         1.5.2    2022-09-29 [2] CRAN (R 4.2.3)
##  devtools       2.4.5    2022-10-11 [2] CRAN (R 4.2.3)
##  digest         0.6.31   2022-12-11 [2] CRAN (R 4.2.3)
##  dplyr        * 1.1.0    2023-01-29 [2] CRAN (R 4.2.3)
##  ellipsis       0.3.2    2021-04-29 [2] CRAN (R 4.2.3)
##  evaluate       0.20     2023-01-17 [2] CRAN (R 4.2.3)
##  fansi          1.0.4    2023-01-22 [2] CRAN (R 4.2.3)
##  farver         2.1.1    2022-07-06 [2] CRAN (R 4.2.3)
##  fastmap        1.1.1    2023-02-24 [2] CRAN (R 4.2.3)
##  fs             1.6.1    2023-02-06 [2] CRAN (R 4.2.3)
##  generics       0.1.3    2022-07-05 [2] CRAN (R 4.2.3)
##  ggplot2      * 3.4.1    2023-02-10 [2] CRAN (R 4.2.3)
##  glue           1.6.2    2022-02-24 [2] CRAN (R 4.2.3)
##  gtable         0.3.2    2023-03-17 [2] CRAN (R 4.2.3)
##  highr          0.10     2022-12-22 [2] CRAN (R 4.2.3)
##  htmltools      0.5.4    2022-12-07 [2] CRAN (R 4.2.3)
##  htmlwidgets    1.6.2    2023-03-17 [2] CRAN (R 4.2.3)
##  httpuv         1.6.9    2023-02-14 [2] CRAN (R 4.2.3)
##  jquerylib      0.1.4    2021-04-26 [2] CRAN (R 4.2.3)
##  jsonlite       1.8.4    2022-12-06 [2] CRAN (R 4.2.3)
##  knitr        * 1.42     2023-01-25 [2] CRAN (R 4.2.3)
##  labeling       0.4.2    2020-10-20 [2] CRAN (R 4.2.3)
##  later          1.3.0    2021-08-18 [2] CRAN (R 4.2.3)
##  lifecycle      1.0.3    2022-10-07 [2] CRAN (R 4.2.3)
##  magick         2.7.4    2023-03-09 [2] CRAN (R 4.2.3)
##  magrittr     * 2.0.3    2022-03-30 [2] CRAN (R 4.2.3)
##  MASS           7.3-58.3 2023-03-07 [3] CRAN (R 4.2.3)
##  memoise        2.0.1    2021-11-26 [2] CRAN (R 4.2.3)
##  mime           0.12     2021-09-28 [2] CRAN (R 4.2.3)
##  miniUI         0.1.1.1  2018-05-18 [2] CRAN (R 4.2.3)
##  munsell        0.5.0    2018-06-12 [2] CRAN (R 4.2.3)
##  NPARC        * 1.10.1   2023-03-20 [1] Bioconductor
##  pillar         1.8.1    2022-08-19 [2] CRAN (R 4.2.3)
##  pkgbuild       1.4.0    2022-11-27 [2] CRAN (R 4.2.3)
##  pkgconfig      2.0.3    2019-09-22 [2] CRAN (R 4.2.3)
##  pkgload        1.3.2    2022-11-16 [2] CRAN (R 4.2.3)
##  prettyunits    1.1.1    2020-01-24 [2] CRAN (R 4.2.3)
##  processx       3.8.0    2022-10-26 [2] CRAN (R 4.2.3)
##  profvis        0.3.7    2020-11-02 [2] CRAN (R 4.2.3)
##  promises       1.2.0.1  2021-02-11 [2] CRAN (R 4.2.3)
##  ps             1.7.2    2022-10-26 [2] CRAN (R 4.2.3)
##  purrr          1.0.1    2023-01-10 [2] CRAN (R 4.2.3)
##  R6             2.5.1    2021-08-19 [2] CRAN (R 4.2.3)
##  Rcpp           1.0.10   2023-01-22 [2] CRAN (R 4.2.3)
##  remotes        2.4.2    2021-11-30 [2] CRAN (R 4.2.3)
##  rlang          1.1.0    2023-03-14 [2] CRAN (R 4.2.3)
##  rmarkdown      2.20     2023-01-19 [2] CRAN (R 4.2.3)
##  sass           0.4.5    2023-01-24 [2] CRAN (R 4.2.3)
##  scales         1.2.1    2022-08-20 [2] CRAN (R 4.2.3)
##  sessioninfo    1.2.2    2021-12-06 [2] CRAN (R 4.2.3)
##  shiny          1.7.4    2022-12-15 [2] CRAN (R 4.2.3)
##  stringi        1.7.12   2023-01-11 [2] CRAN (R 4.2.3)
##  stringr        1.5.0    2022-12-02 [2] CRAN (R 4.2.3)
##  tibble         3.2.1    2023-03-20 [2] CRAN (R 4.2.3)
##  tidyr          1.3.0    2023-01-24 [2] CRAN (R 4.2.3)
##  tidyselect     1.2.0    2022-10-10 [2] CRAN (R 4.2.3)
##  urlchecker     1.0.1    2021-11-30 [2] CRAN (R 4.2.3)
##  usethis        2.1.6    2022-05-25 [2] CRAN (R 4.2.3)
##  utf8           1.2.3    2023-01-31 [2] CRAN (R 4.2.3)
##  vctrs          0.6.0    2023-03-16 [2] CRAN (R 4.2.3)
##  withr          2.5.0    2022-03-03 [2] CRAN (R 4.2.3)
##  xfun           0.37     2023-01-31 [2] CRAN (R 4.2.3)
##  xtable         1.8-4    2019-04-21 [2] CRAN (R 4.2.3)
##  yaml           2.3.7    2023-01-23 [2] CRAN (R 4.2.3)
## 
##  [1] /tmp/RtmphIVBJv/Rinst6563a3d5b6788
##  [2] /home/biocbuild/bbs-3.16-bioc/R/site-library
##  [3] /home/biocbuild/bbs-3.16-bioc/R/library
## 
## ──────────────────────────────────────────────────────────────────────────────

Bibliography

Childs, Dorothee, Karsten Bach, Holger Franken, Simon Anders, Nils Kurzawa, Marcus Bantscheff, Mikhail Savitski, and Wolfgang Huber. 2019. “Non-Parametric Analysis of Thermal Proteome Profiles Reveals Novel Drug-Binding Proteins.” Molecular & Cellular Proteomics, October. https://doi.org/10.1074/mcp.TIR119.001481.

Marazzi, A. 2002. “Bootstrap Tests for Robust Means of Asymmetric Distributions with Unequal Shapes.” Computational Statistics & Data Analysis 39 (4): 503–28.

Savitski, Mikhail M, Friedrich B M Reinhard, Holger Franken, Thilo Werner, Maria Fälth Savitski, Dirk Eberhard, Daniel Martinez Molina, et al. 2014. “Tracking Cancer Drugs in Living Cells by Thermal Profiling of the Proteome.” Science 346 (6205): 1255784.

Analysing thermal proteome profiling data with the NPARC package

20 March 2023, 17:40:13

Package

Contents