= R XGBoost = [[XGBoost]] is a gradient boosting library. This article is specifically about the official R bindings. <> ---- == Installation == {{{ install.packages('XGBoost') }}} ---- == Usage == Some data preparation is required, e.g. [[R/Caret|partitioning the data into a training and testing sets]]. {{{ data(mtcars) # Identify the independent variables predictors <- c("disp","wt","cyl","gear","carb") # Partition data set parts = createDataPartition(mtcars$mpg, p = .8) train = mtcars[parts, ] test = mtcars[-parts, ] # Create analytic matrices train.x <- data.matrix(train[,predictors]) test.x <- data.matrix(test[,predictors]) }}} Try: {{{ library(xgboost) # Note: data is the analytic matrix, label is the outcome xgb.train = xgb.DMatrix(data = train.x, label = train$mpg) xgb.test = xgb.DMatrix(data = test.x, label = test$mpg) # Estimate model on training data set my.model = xgboost(data = xgb.train, max.depth = 3, nrounds = 70) [snip] # Predict outcome using trained model and test data set pred <- predict(my.model, xgb.test) }}} Note that the `xgboost` function is a wrapper around '''`xgb.train`'''. There is also a '''`xgb.cv`''' function that automatically partitions the data set into random, equal-sized samples. One is retained for model training, while the others are swapped in and out for testing the tuning adjustments through cross-validation. {{{ my.data <- xgb.DMatrix(data = mtcars[,predictors], label = mtcars$mpg) my.model <- xgb.cv(data = my.data, nfold = 5, nrounds = 3, max_depth = 3) }}} To get the '''importance matrix''' (the importance metrics as a data.table object), try: {{{ my.importance <- xgb.importance(model = my.model) }}} ---- CategoryRicottone