|
⇤ ← Revision 1 as of 2026-04-12 16:04:22
Size: 612
Comment: Initial commit
|
← Revision 2 as of 2026-04-12 18:28:39 ⇥
Size: 2191
Comment: Notes
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 27: | Line 27: |
| `-chaid-` requires that all variables be categorical. Unordered categorical variables that serve as predictors should be passed on the '''`unordered()`''' option, and ordered categorical variables passed on the '''`ordered()`'''. Ordered variables are only partitioned into contiguous groups of levels. Note that categorical variables with more than 21 levels are not supported in the current version. If the dependent variable is ordered, pass the '''`dvordered`''' option. To cause `-chaid-` to calculate the test with a [[Stata/SvySet|-svyset-]] survey design, pass the '''`svy`''' option. Try: {{{ chaid depvar, unordered(varlist) svy }}} `-chaid-` creates/overwrites a variable named `_CHAID` which stores the terminal node for each case. Note that the terminal node enumeration may not match the output exactly. === Tuning === `-chaid-` defaults to a minimum split size of 200, meaning that any node with fewer than 200 cases will be skipped over when considering possible partitions. It also defaults to a minimum node size of 100. If a candidate partition creates a node with fewer than 100 cases, it is passed over. These minimums are configured with the '''`minsplit()`''' and '''`minnode()`''' options respectively. Generally the minimum node size is kept at half of the minimum split size. === Binning === `-chaid-` supports automatic [[Statistics/Binning|binning]] for continuous independent variables. Try: {{{ chaid depvar, xtile(varlist, n(5)) }}} If the `n()` suboption is not specified, the default is 2. |
Stata Chaid
-chaid- is a user-written program for decision tree modeling.
Contents
Installation
Try using SSC, like:
ssc install chaid
Description
CHAID (Chi-square automatic interaction detection) is an algorithm for fitting a decision tree. All possible partitions are tried recursively, and evaluated using a chi-squared test.
-chaid- requires that all variables be categorical. Unordered categorical variables that serve as predictors should be passed on the unordered() option, and ordered categorical variables passed on the ordered(). Ordered variables are only partitioned into contiguous groups of levels. Note that categorical variables with more than 21 levels are not supported in the current version.
If the dependent variable is ordered, pass the dvordered option.
To cause -chaid- to calculate the test with a -svyset- survey design, pass the svy option.
Try:
chaid depvar, unordered(varlist) svy
-chaid- creates/overwrites a variable named _CHAID which stores the terminal node for each case. Note that the terminal node enumeration may not match the output exactly.
Tuning
-chaid- defaults to a minimum split size of 200, meaning that any node with fewer than 200 cases will be skipped over when considering possible partitions.
It also defaults to a minimum node size of 100. If a candidate partition creates a node with fewer than 100 cases, it is passed over.
These minimums are configured with the minsplit() and minnode() options respectively. Generally the minimum node size is kept at half of the minimum split size.
Binning
-chaid- supports automatic binning for continuous independent variables. Try:
chaid depvar, xtile(varlist, n(5))
If the n() suboption is not specified, the default is 2.
