Differences between revisions 2 and 4 (spanning 2 versions)
Revision 2 as of 2025-01-10 14:19:09
Size: 992
Comment: Killing Econometrics page
Revision 4 as of 2025-11-10 15:01:14
Size: 1301
Comment: Rewrite
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
## page was renamed from Econometrics/Binning
Line 4: Line 3:
'''Binning''' is the process of transforming a continuous variable into either a categorical variable or a set of discrete indicator variables. '''Binning''' is a pre-processing technique.
Line 7: Line 6:

----



== Description ==

Continuous variables can be inconvenient as predictors when the expected relationship is non-linear. Consider age and [[UnitedStates/WelfarePolicy/HealthInsurancePlans|health insurance consumption in the U.S.]]; there are discontinuities at 26 and around retirement age.

One solution is to treat age as a discrete variable (e.g., a dummy variable for each year, or some other small interval which captures all variation), but this may be an under-powered analysis. Designing discrete bins that capture more observations, while still capturing all variation, is the next step.
Line 16: Line 25:
=== In Regression Models === === Modeling ===
Line 18: Line 27:
There is significant discussion surrounding the use of binning in regression models. Binning is a common strategy but not always the correct one.
Line 20: Line 29:
If the intention of binning is to capture non-linearity, then a non-linear model is generally preferable. Another common solution to that problem is to regress on a variable and the square of that variable. If the intention is to capture non-linearity, then a non-linear model (e.g. a log-linear model) is generally preferable to binning.
Line 22: Line 31:
Gelman discusses the approach partly in [[RegressionAndOtherStories|Regression and Other Stories]] and in [[https://statmodeling.stat.columbia.edu/2024/06/19/what/|blog posts]]: "One thing that people don't always realize is that you can do binning and linear together, for example in [[R]], `y ~ z + age + age.65.74 + age75.84 + age.85.up`" [link and styling mine]. Also, per [[RegressionAndOtherStories|Regression and Other Stories]] and related [[https://statmodeling.stat.columbia.edu/2024/06/19/what/|blog posts]]: "One thing that people don't always realize is that you can do binning and linear together, for example in [[R]], `y ~ z + age + age.65.74 + age75.84 + age.85.up`".

Binning

Binning is a pre-processing technique.


Description

Continuous variables can be inconvenient as predictors when the expected relationship is non-linear. Consider age and health insurance consumption in the U.S.; there are discontinuities at 26 and around retirement age.

One solution is to treat age as a discrete variable (e.g., a dummy variable for each year, or some other small interval which captures all variation), but this may be an under-powered analysis. Designing discrete bins that capture more observations, while still capturing all variation, is the next step.


Usage

Modeling

Binning is a common strategy but not always the correct one.

If the intention is to capture non-linearity, then a non-linear model (e.g. a log-linear model) is generally preferable to binning.

Also, per Regression and Other Stories and related blog posts: "One thing that people don't always realize is that you can do binning and linear together, for example in R, y ~ z + age + age.65.74 + age75.84 + age.85.up".


CategoryRicottone

Statistics/Binning (last edited 2025-11-10 15:01:14 by DominicRicottone)