Binning

Binning is a pre-processing technique.


Description

Continuous variables can be inconvenient as predictors when the expected relationship is non-linear. Consider age and health insurance consumption in the U.S.; there are discontinuities at 26 and around retirement age.

One solution is to treat age as a discrete variable (e.g., a dummy variable for each year, or some other small interval which captures all variation), but this may be an under-powered analysis. Designing discrete bins that capture more observations, while still capturing all variation, is the next step.


Usage

Modeling

Binning is a common strategy but not always the correct one.

If the intention is to capture non-linearity, then a non-linear model (e.g. a log-linear model) is generally preferable to binning.

Also, per Regression and Other Stories and related blog posts: "One thing that people don't always realize is that you can do binning and linear together, for example in R, y ~ z + age + age.65.74 + age75.84 + age.85.up".


CategoryRicottone

Statistics/Binning (last edited 2025-11-10 15:01:14 by DominicRicottone)