Stata Chaid
-chaid- is a user-written program for decision tree modeling.
Contents
Installation
Try using SSC, like:
ssc install chaid
Description
CHAID (Chi-square automatic interaction detection) is an algorithm for fitting a decision tree. All possible partitions are tried recursively, and evaluated using a chi-squared test.
-chaid- requires that all variables be categorical. Unordered categorical variables that serve as predictors should be passed on the unordered() option, and ordered categorical variables passed on the ordered(). Ordered variables are only partitioned into contiguous groups of levels. Note that categorical variables with more than 21 levels are not supported in the current version.
If the dependent variable is ordered, pass the dvordered option.
To cause -chaid- to calculate the test with a -svyset- survey design, pass the svy option.
Try:
chaid depvar, unordered(varlist) svy
-chaid- creates/overwrites a variable named _CHAID which stores the terminal node for each case. Note that the terminal node enumeration may not match the output exactly.
Tuning
-chaid- defaults to a minimum split size of 200, meaning that any node with fewer than 200 cases will be skipped over when considering possible partitions.
It also defaults to a minimum node size of 100. If a candidate partition creates a node with fewer than 100 cases, it is passed over.
These minimums are configured with the minsplit() and minnode() options respectively. Generally the minimum node size is kept at half of the minimum split size.
Binning
-chaid- supports automatic binning for continuous independent variables. Try:
chaid depvar, xtile(varlist, n(5))
If the n() suboption is not specified, the default is 2.
