xMLrandomforestR Documentation

Function to integrate predictor matrix in a supervised manner via machine learning algorithm random forest.

Description

xMLrandomforest is supposed to integrate predictor matrix in a supervised manner via machine learning algorithm random forest. It requires three inputs: 1) Gold Standard Positive (GSP) targets; 2) Gold Standard Negative (GSN) targets; 3) a predictor matrix containing genes in rows and predictors in columns, with their predictive scores inside it. It returns an object of class 'pTarget'.

Usage

xMLrandomforest(df_predictor, GSP, GSN, nfold = 3, mtry = NULL,
ntree = 2000, fold.aggregateBy = c("Ztransform", "logistic", "fishers",
"orderStatistic"), verbose = TRUE, ...)

Arguments

df_predictor

a data frame containing genes (in rows) and predictors (in columns), with their predictive scores inside it. This data frame must has gene symbols as row names

GSP

a vector containing Gold Standard Positive (GSP)

GSN

a vector containing Gold Standard Negative (GSN)

nfold

an integer specifying the number of folds for cross validataion

mtry

an integer specifying the number of predictors randomly sampled as candidates at each split. If NULL, it will be tuned by 'randomForest::tuneRF', with starting value as sqrt(p) where p is the number of predictors. The minimum value is 3

ntree

an integer specifying the number of trees to grow. By default, it sets to 2000

fold.aggregateBy

the aggregate method used to aggregate results from k-fold cross validataion. It can be either "orderStatistic" for the method based on the order statistics of p-values, or "fishers" for Fisher's method, "Ztransform" for Z-transform method, "logistic" for the logistic method. Without loss of generality, the Z-transform method does well in problems where evidence against the combined null is spread widely (equal footings) or when the total evidence is weak; Fisher's method does best in problems where the evidence is concentrated in a relatively small fraction of the individual tests or when the evidence is at least moderately strong; the logistic method provides a compromise between these two. Notably, the aggregate methods 'Ztransform' and 'logistic' are preferred here

verbose

logical to indicate whether the messages will be displayed in the screen. By default, it sets to TRUE for display

...

additional graphic parameters. Please refer to 'randomForest::randomForest' for the complete list.

Value

an object of class "pTarget", a list with following components:

Note

none

Examples

## Not run: 
# Load the library
library(Pi)

## End(Not run)
RData.location <- "http://galahad.well.ox.ac.uk/bigdata_dev"
## Not run: 
pTarget <- xMLrandomforest(df_prediction, GSP, GSN)

## End(Not run)