xPredictPRR Documentation

Function to assess the prediction performance via Precision-Recall (PR) analysis

Description

xPredictPR is supposed to assess the prediction performance via Precision-Recall (PR) analysis. It requires two inputs: 1) Gold Standard Positive (GSP) containing targets; 2) prediction containing predicted targets and predictive scores.

Usage

xPredictPR(GSP, prediction, num.threshold = 20, bin = c("quantile",
"uniform"), recall.prediction = FALSE, GSN = NULL, plot = TRUE,
smooth = FALSE, verbose = TRUE)

Arguments

GSP

a vector containing Gold Standard Positive (GSP)

prediction

a data frame containing predictions along with predictive scores. It has two columns: 1st column for target, 2nd column for predictive scores (the higher the better)

num.threshold

an integer to specify how many PR points (as a function of the score threshold) will be calculated

bin

how to bin the scores. It can be "uniform" for binning scores with equal interval (ie with uniform distribution), and 'quantile' for binning scores with eual frequency (ie with equal number)

recall.prediction

logical to indicate whether the calculation of recall is based on predictable GSP. By default, it sets to FALSE

GSN

a vector containing Gold Standard Negative (GSN). It is optional. By default (NULL), GSN is not provided

plot

logical to indicate whether to return an object of class "ggplot" for plotting PR curve. By default, it sets to FALSE. If TRUE, it will return a ggplot object after being appended with 'PR' and 'Fmax'

smooth

logical to indicate whether to smooth the curve by making sure a non-increasing order for precision. By default, it sets to FALSE

verbose

logical to indicate whether the messages will be displayed in the screen. By default, it sets to TRUE for display

Value

If plot is FALSE (by default), a data frame containing three columns: 1st column 'Threshold' for the score threshold, 2nd column 'Precision' for precision, 3rd 'Recall' for recall. If plot is TRUE, it will return a ggplot object after being appended with 'PR' (a data frame containing three columns: 1st column 'Threshold' for the score threshold, 2nd column 'Precision' for precision, 3rd 'Recall' for recall), and 'Fmax' for maximum F-measure.

Note

F-measure: the maximum of a harmonic mean between precision and recall along PR curve

Examples

## Not run: 
PR <- xPredictPR(GSP, prediction)

## End(Not run)