Stata Stepwise
The stepwise command runs a model iteratively with stepwise removal (or addition) of terms.
Usage
use http://stata-press.com/data/r14/auto, clear generate weight2 = weight * weight stepwise, pr(.2): regress mpg weight weight2 (displ gear) turn headroom foreign price
This is equivalent to interactively running:
// Full model regress mpg weight weight2 (displ gear) turn headroom foreign price // Observe that the `headroom` parameter has the greatest non-significant (>=0.2) p-value, so remove it regress mpg weight weight2 (displ gear) turn foreign price // ... remove the `(displ gear)` parameter regress mpg weight weight2 turn foreign price // ... remove the `price` parameter regress mpg weight weight2 turn foreign // Observe that all remaining parameters are significant
Implicitly this method uses Wald tests.
Syntax
After using stepwise, calling the command again (or the underlying modeling command) without any arguments reproduces the stepwise estimation results.
The stepwise command is aliased to sw.
Terms
Terms on a modeling command are commonly variable names. They can also be factor variables.
Terms are considered with respect to parentheses. For example, in this model:
stepwise, pr(.2): regress y x1 x2 x3 x4 i.a
...each factor variable of a is considered separately. Alternatively, in this model:
stepwise, pr(.2): regress y x1 x2 x3 x4 (i.a)
...the factor variables of a are considered altogether.
Options
The pr() option specifies a signficance level over which parameters are stepwise removed. This mode is called backward selection.
Compare to the pe() option, which specifies a significant level at which parameters are stepwise added. This mode is called forward selection.
pr() and pe() can be used simultaneously. At first the model is fit with backward selection. Then excluded terms are re-examined for re-addition. Then included terms are re-examined for re-removal. This is repeated until all included parameters are significant and all excluded parameters are non-significant. Because of how equivalence is treated by the significance tests, it can be necessary to combine these options with unusual numbers, like:
stepwise, pr(0.050001) pe(0.05): regress mpg weight weight2 (displ gear) turn headroom foreign price
The forward is only effective when using the pr() and pe() options simultaneously. At first the model is fit with forward selection. Then included terms are re-examined for re-removal. Then excluded terms are re-examined for re-addition.
The hierarchical option directs stepwise to consider parameters in order. Given a model fit on x1, x2, and x3 and a backward selection mode: x3 is the first parameter considered for removal regardless of how its p-value compares to other parameters'. (This can be called backward hierarchical selection.) Instead fiven a forward selection mode: x1 is the first parameter considered for addition. (This can be called forward hierarchical selection.)
The lockterm option locks the first independent variable into the model. For example, to lock the parameter for x1, try:
stepwise, pr(0.2) lockterm1: logistic y x1 x2 x3
This option respects parentheses. To lock the parameters for x1 and x2, try:
stepwise, pr(0.2) lockterm1: logistic y (x1 x2) x3
Note that some modeling commands do not take a dependent variable, and some take more than one dependent variable, so it is misleading to assume that lockterm1 locks the second specified term.
The lr option specifies the likelihood ratio test instead of the Wald test.