evaluatePrediction {kebabs} | R Documentation |
Evaluate performance results of prediction on a testset based on given labels for binary classification
evaluatePrediction(prediction, label, allLabels = NULL, decValues = NULL, print = TRUE, confmatrix = TRUE, numPrecision = 3, numPosNegTrainSamples = numeric(0))
prediction |
prediction results as returned by |
label |
label vector of same length as parameter 'prediction'. |
allLabels |
vector containing all occuring labels once. This parameter is required only if the label vector is numeric. Default=NULL |
decValues |
numeric vector containing decision values for the
predictions as returned by the |
print |
This parameter indicates whether performance values should be printed or returned as data frame without printing (for details see below). Default=TRUE |
confmatrix |
When set to TRUE a confusion matrix is printed. The rows correspond to predictions, the columns to the true labels. Default=TRUE |
numPrecision |
minimum number of digits to the right of the decimal point. Values between 0 and 20 are allowed. Default=3 |
numPosNegTrainSamples |
optional integer vector with two values giving the number of positive and negative training samples. When this parameter is set the balancedness of the training set is reported. Default=numeric(0) |
For binary classfication this function computes the performance measures
accuracy, balanced accuracy, sensitivity, specificity, precision and the
Matthews Correlation Coefficient(MCC). If decision values are passed in the
parameter decValues
the function additionally determines the AUC.
When the number of positive and negative training samples is passed to
the function it also shows the balancedness of the training set. The
performance results are either printed by the routine directly or returned
in a data frame. The columns of the data frame are:
column name | performance measure |
-------------------- | -------------- |
TP | true positive |
FP | false positive |
FN | false negative |
TN | true negative |
ACC | accuracy |
BAL_ACC | balanced accuracy |
SENS | sensitivity |
SPEC | specificity |
PREC | precision |
MAT_CC | Matthews correlation coefficient |
AUC | area under ROC curve |
PBAL | prediction balancedness (fraction of positive samples) |
TBAL | training balancedness (fraction of positive samples) |
When the parameter 'print' is set to FALSE the function returns a data frame containing the prediction performance values (for details see above).
Johannes Palme <kebabs@bioinf.jku.at>
http://www.bioinf.jku.at/software/kebabs
J. Palme, S. Hochreiter, and U. Bodenhofer (2015) KeBABS: an R package
for kernel-based analysis of biological sequences.
Bioinformatics, 31(15):2574-2576, 2015.
DOI: 10.1093/bioinformatics/btv176.
## set seed for random generator, included here only to make results ## reproducable for this example set.seed(456) ## load transcription factor binding site data data(TFBS) enhancerFB ## select 70% of the samples for training and the rest for test train <- sample(1:length(enhancerFB), length(enhancerFB) * 0.7) test <- c(1:length(enhancerFB))[-train] ## create the kernel object for gappy pair kernel with normalization gappy <- gappyPairKernel(k=1, m=3) ## show details of kernel object gappy ## run training with explicit representation model <- kbsvm(x=enhancerFB[train], y=yFB[train], kernel=gappy, pkg="LiblineaR", svm="C-svc", cost=80, explicit="yes", featureWeights="no") ## predict the test sequences pred <- predict(model, enhancerFB[test]) ## print prediction performance evaluatePrediction(pred, yFB[test], allLabels=unique(yFB)) ## Not run: ## print prediction performance including AUC ## additionally determine decision values preddec <- predict(model, enhancerFB[test], predictionType="decision") evaluatePrediction(pred, yFB[test], allLabels=unique(yFB), decValues=preddec) ## print prediction performance including training set balance trainPosNeg <- c(length(which(yFB[train] == 1)), length(which(yFB[train] == -1))) evaluatePrediction(pred, yFB[test], allLabels=unique(yFB), numPosNegTrainSamples=trainPosNeg) ## or get prediction performance as data frame perf <- evaluatePrediction(pred, yFB[test], allLabels=unique(yFB), print=FALSE) ## show performance values in data frame perf ## End(Not run)