PRObabilistic Pathway Scores (PROPS)

Lichy Han

2017-10-30

This R package calculates PRObabilistic Pathway Scores (PROPS), which are pathway-based features, from gene-based data. For more information, see:

Lichy Han, Mateusz Maciejewski, Christoph Brockel, William Gordon, Scott B. Snapper, Joshua R. Korzenik, Lovisa Afzelius, Russ B. Altman. A PRObabilistic Pathway Score (PROPS) for Classification with Applications to Inflammatory Bowel Disease.

Example Data

In the package, example healthy data and patient data are included. Note that each data frame is patient x gene, and the column names are by gene Entrez ID.

data(example_healthy)
example_healthy[1:5,1:5]
10 100 1000 10000 10005
HealthySample1 6.993072 7.890769 1.187037 3.186920 2.243749
HealthySample2 7.170742 5.786940 6.681429 4.765805 5.404951
HealthySample3 4.458645 4.302536 3.893047 1.921196 8.437868
HealthySample4 3.215118 5.515139 4.178101 6.063425 6.152115
HealthySample5 4.456360 8.889095 6.342727 8.625195 2.956368
data(example_data)
example_data[1:5,1:5]
10 100 1000 10000 10005
Sample1 3.848834 3.485361 5.650030 7.020021 3.996484
Sample2 4.165301 1.808756 5.927112 10.229086 6.963043
Sample3 4.874789 4.627511 6.410964 6.036542 4.334305
Sample4 1.295941 4.173276 2.408115 4.329023 3.800146
Sample5 8.924654 4.229810 3.939608 7.241261 5.378624

Calculating PROPS using KEGG

KEGG edges have been included as part of this package, and will be used by default. To run PROPS, simply call the props command with the healthy data and the disease data.

props_features <- props(example_healthy, example_data)
props_features[1:5,1:5]
pathway_ID Sample1 Sample2 Sample3 Sample4
00010.xml -156.70402 -157.34471 -146.13367 -146.48038
00020.xml -66.06094 -62.17654 -68.88284 -63.13460
00030.xml -66.19930 -64.09404 -61.87067 -66.46006
00040.xml -75.30225 -58.62923 -64.89698 -60.47877
00051.xml -67.89998 -68.44797 -71.99501 -73.01245

Optional Batch Correction

As part of this package, we have included an optional flag to use ComBat via the sva package. Let us have two batches of data, where the first 50 samples from example_healthy and first 20 samples from example_data are from batch 1, and the remaining are from batch 2. Run the props command with batch_correct as TRUE, followed by the batch numbers.

healthy_batches = c(rep(1, 25), rep(2, 25))
dat_batches = c(rep(1, 20), rep(2, 30))

props_features_batchcorrected <- props(example_healthy, example_data, batch_correct = TRUE, healthy_batches = healthy_batches, dat_batches = dat_batches)
## Found2batches
## Adjusting for0covariate(s) or covariate level(s)
## Fitting L/S model and finding priors
## Finding parametric adjustments
## Adjusting the Data
props_features_batchcorrected[1:5,1:5]
pathway_ID Sample1 Sample2 Sample3 Sample4
00010.xml -155.84908 -157.19236 -144.99997 -145.30414
00020.xml -65.95175 -61.87948 -68.17813 -62.65637
00030.xml -65.59704 -63.82354 -62.13070 -65.14733
00040.xml -74.86549 -58.33588 -64.11862 -60.10108
00051.xml -67.49010 -67.54269 -71.10356 -72.61636

Calculating PROPS using User Input Pathways

A user can also input his or her own pathways, and thus our package is compatible with additional pathway databases and hand-curated pathways from literature or data. To do so, one needs to format the pathways into three columns, where the first column is the source or “from” node of the edge, the second column is the sink or “to” node of the edge, and the third column is the pathway ID (e.g. “glucose_metabolism”).

data(example_edges)
example_edges[1:8, ]
from to pathway_ID
7476 8322 pathway1
3913 3690 pathway1
26060 836 pathway1
163688 5532 pathway1
84812 10423 pathway1
57104 1056 pathway1
9651 8396 pathway1
3976 3563 pathway1

Run props with the user specified edges as follows:

props_features_userpathways <- props(example_healthy, example_data, pathway_edges = example_edges)
props_features_userpathways[,1:5]
pathway_ID Sample1 Sample2 Sample3 Sample4
pathway1 -370.3345 -372.7717 -397.3746 -370.7386
pathway2 -355.6405 -343.6026 -354.3261 -339.1284
pathway3 -357.4726 -354.8797 -352.8343 -353.7832