R XGBoost
XGBoost is a gradient boosting library. This article is specifically about the official R bindings.
Contents
Installation
install.packages('XGBoost')
Usage
Some data preparation is required, e.g. partitioning the data into a training and testing sets.
data(mtcars) # Identify the independent variables predictors <- c("disp","wt","cyl","gear","carb") # Partition data set parts = createDataPartition(mtcars$mpg, p = .8) train = mtcars[parts, ] test = mtcars[-parts, ] # Create analytic matrices train.x <- data.matrix(train[,predictors]) test.x <- data.matrix(test[,predictors])
Try:
library(xgboost) # Note: data is the analytic matrix, label is the outcome xgb.train = xgb.DMatrix(data = train.x, label = train$mpg) xgb.test = xgb.DMatrix(data = test.x, label = test$mpg) # Estimate model on training data set my.model = xgboost(data = xgb.train, max.depth = 3, nrounds = 70) [snip] # Predict outcome using trained model and test data set pred <- predict(my.model, xgb.test)
Note that the xgboost function is a wrapper around xgb.train.
There is also a xgb.cv function that automatically partitions the data set into random, equal-sized samples. One is retained for model training, while the others are swapped in and out for testing the tuning adjustments through cross-validation.
my.data <- xgb.DMatrix(data = mtcars[,predictors], label = mtcars$mpg) my.model <- xgb.cv(data = my.data, nfold = 5, nrounds = 3, max_depth = 3)
To get the importance matrix (the importance metrics as a data.table object), try:
my.importance <- xgb.importance(model = my.model)