Sören Künzel, Theo Saarinen, Simon Walter, Edward Liu, Allen Tang, Jasjeet Sekhon
Rforestry is a fast implementation of Random Forests, Gradient Boosting, and Linear Random Forests, with an emphasis on inference and interpretability.
install.packages("devtools")
.devtools::has_devel()
to check whether you do. If no
development environment exists, Windows users download and install Rtools and
macOS users download and install Xcode.devtools::install_github("forestry-labs/Rforestry")
. For
Windows users, you’ll need to skip 64-bit compilation
devtools::install_github("forestry-labs/Rforestry", INSTALL_opts = c('--no-multiarch'))
due to an outstanding gcc issue.set.seed(292315)
library(Rforestry)
<- sample(nrow(iris), 3)
test_idx <- iris[-test_idx, -1]
x_train <- iris[-test_idx, 1]
y_train <- iris[test_idx, -1]
x_test
<- forestry(x = x_train, y = y_train)
rf = predict(rf, x_test, aggregation = "weightMatrix")$weightMatrix
weights
%*% y_train
weights predict(rf, x_test)
A fast implementation of random forests using ridge penalized splitting and ridge regression for predictions.
Example:
set.seed(49)
library(Rforestry)
<- c(100)
n <- rnorm(n)
a <- rnorm(n)
b <- rnorm(n)
c <- 4*a + 5.5*b - .78*c
y <- data.frame(a,b,c)
x <- forestry(x, y, ridgeRF = TRUE)
forest predict(forest, x)
A parameter controlling monotonic constraints for features in forestry.
library(Rforestry)
<- rnorm(150)+5
x <- .15*x + .5*sin(3*x)
y <- data.frame(x1 = x, x2 = rnorm(150)+5, y = y + rnorm(150, sd = .4))
data_train
<- forestry(x = data_train %>% select(-y),
monotone_rf y = data_train$y,
monotonicConstraints = c(-1,-1),
nodesizeStrictSpl = 5,
nthread = 1,
ntree = 25)
predict(monotone_rf, feature.new = data_train %>% select(-y))
We can return the predictions for the training dataset using only the trees in which each observation was out of bag. Note that when there are few trees, or a high proportion of the observations sampled, there may be some observations which are not out of bag for any trees. The predictions for these are returned NaN.
library(Rforestry)
# Train a forest
<- forestry(x = iris[,-1],
rf y = iris[,1],
ntree = 500)
# Get the OOB predictions for the training set
<- getOOBpreds(rf)
oob_preds
# This should be equal to the OOB error
sum((oob_preds - iris[,1])^2)
getOOB(rf)