|
Size: 992
Comment: Killing Econometrics page
|
← Revision 4 as of 2025-11-10 15:01:14 ⇥
Size: 1301
Comment: Rewrite
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 1: | Line 1: |
| ## page was renamed from Econometrics/Binning | |
| Line 4: | Line 3: |
| '''Binning''' is the process of transforming a continuous variable into either a categorical variable or a set of discrete indicator variables. | '''Binning''' is a pre-processing technique. |
| Line 7: | Line 6: |
---- == Description == Continuous variables can be inconvenient as predictors when the expected relationship is non-linear. Consider age and [[UnitedStates/WelfarePolicy/HealthInsurancePlans|health insurance consumption in the U.S.]]; there are discontinuities at 26 and around retirement age. One solution is to treat age as a discrete variable (e.g., a dummy variable for each year, or some other small interval which captures all variation), but this may be an under-powered analysis. Designing discrete bins that capture more observations, while still capturing all variation, is the next step. |
|
| Line 16: | Line 25: |
| === In Regression Models === | === Modeling === |
| Line 18: | Line 27: |
| There is significant discussion surrounding the use of binning in regression models. | Binning is a common strategy but not always the correct one. |
| Line 20: | Line 29: |
| If the intention of binning is to capture non-linearity, then a non-linear model is generally preferable. Another common solution to that problem is to regress on a variable and the square of that variable. | If the intention is to capture non-linearity, then a non-linear model (e.g. a log-linear model) is generally preferable to binning. |
| Line 22: | Line 31: |
| Gelman discusses the approach partly in [[RegressionAndOtherStories|Regression and Other Stories]] and in [[https://statmodeling.stat.columbia.edu/2024/06/19/what/|blog posts]]: "One thing that people don't always realize is that you can do binning and linear together, for example in [[R]], `y ~ z + age + age.65.74 + age75.84 + age.85.up`" [link and styling mine]. | Also, per [[RegressionAndOtherStories|Regression and Other Stories]] and related [[https://statmodeling.stat.columbia.edu/2024/06/19/what/|blog posts]]: "One thing that people don't always realize is that you can do binning and linear together, for example in [[R]], `y ~ z + age + age.65.74 + age75.84 + age.85.up`". |
Binning
Binning is a pre-processing technique.
Contents
Description
Continuous variables can be inconvenient as predictors when the expected relationship is non-linear. Consider age and health insurance consumption in the U.S.; there are discontinuities at 26 and around retirement age.
One solution is to treat age as a discrete variable (e.g., a dummy variable for each year, or some other small interval which captures all variation), but this may be an under-powered analysis. Designing discrete bins that capture more observations, while still capturing all variation, is the next step.
Usage
Modeling
Binning is a common strategy but not always the correct one.
If the intention is to capture non-linearity, then a non-linear model (e.g. a log-linear model) is generally preferable to binning.
Also, per Regression and Other Stories and related blog posts: "One thing that people don't always realize is that you can do binning and linear together, for example in R, y ~ z + age + age.65.74 + age75.84 + age.85.up".
