⇤ ← Revision 1 as of 2025-03-20 14:08:43
Size: 1816
Comment: Initial commit
|
← Revision 2 as of 2025-04-08 14:47:40 ⇥
Size: 1926
Comment: Cleanup
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
[[XGBoost]] is a gradient boosting library. This article is specifically about the official R bindings. | [[XGBoost]] is a software implementation of [[Statistics/GradientBoosting|gradient boosting]] for estimating [[Statistics/DecisionTrees|decision trees]]. This article is specifically about the official R bindings. |
R XGBoost
XGBoost is a software implementation of gradient boosting for estimating decision trees. This article is specifically about the official R bindings.
Contents
Installation
install.packages('XGBoost')
Usage
Some data preparation is required, e.g. partitioning the data into a training and testing sets.
data(mtcars) # Identify the independent variables predictors <- c("disp","wt","cyl","gear","carb") # Partition data set parts = createDataPartition(mtcars$mpg, p = .8) train = mtcars[parts, ] test = mtcars[-parts, ] # Create analytic matrices train.x <- data.matrix(train[,predictors]) test.x <- data.matrix(test[,predictors])
Try:
library(xgboost) # Note: data is the analytic matrix, label is the outcome xgb.train = xgb.DMatrix(data = train.x, label = train$mpg) xgb.test = xgb.DMatrix(data = test.x, label = test$mpg) # Estimate model on training data set my.model = xgboost(data = xgb.train, max.depth = 3, nrounds = 70) [snip] # Predict outcome using trained model and test data set pred <- predict(my.model, xgb.test)
Note that the xgboost function is a wrapper around xgb.train.
There is also a xgb.cv function that automatically partitions the data set into random, equal-sized samples. One is retained for model training, while the others are swapped in and out for testing the tuning adjustments through cross-validation.
my.data <- xgb.DMatrix(data = mtcars[,predictors], label = mtcars$mpg) my.model <- xgb.cv(data = my.data, nfold = 5, nrounds = 3, max_depth = 3)
To get the importance matrix (the importance metrics as a data.table object), try:
my.importance <- xgb.importance(model = my.model)