R XGBoost

XGBoost is a gradient boosting library. This article is specifically about the official R bindings.


Installation

install.packages('XGBoost')


Usage

Some data preparation is required, e.g. partitioning the data into a training and testing sets.

data(mtcars)

# Identify the independent variables
predictors <- c("disp","wt","cyl","gear","carb")

# Partition data set
parts = createDataPartition(mtcars$mpg, p = .8)
train = mtcars[parts, ]
test = mtcars[-parts, ]

# Create analytic matrices
train.x <- data.matrix(train[,predictors])
test.x <- data.matrix(test[,predictors])

Try:

library(xgboost)

# Note: data is the analytic matrix, label is the outcome
xgb.train = xgb.DMatrix(data = train.x, label = train$mpg)
xgb.test = xgb.DMatrix(data = test.x, label = test$mpg)

# Estimate model on training data set
my.model = xgboost(data = xgb.train, max.depth = 3, nrounds = 70)
[snip]

# Predict outcome using trained model and test data set
pred <- predict(my.model, xgb.test)

Note that the xgboost function is a wrapper around xgb.train.

There is also a xgb.cv function that automatically partitions the data set into random, equal-sized samples. One is retained for model training, while the others are swapped in and out for testing the tuning adjustments through cross-validation.

my.data <- xgb.DMatrix(data = mtcars[,predictors], label = mtcars$mpg)
my.model <- xgb.cv(data = my.data, nfold = 5, nrounds = 3, max_depth = 3)

To get the importance matrix (the importance metrics as a data.table object), try:

my.importance <- xgb.importance(model = my.model)


CategoryRicottone

R/XGBoost (last edited 2025-03-20 14:08:43 by DominicRicottone)