= Binning = '''Binning''' is a pre-processing technique. <> ---- == Description == Continuous variables can be inconvenient as predictors when the expected relationship is non-linear. Consider age and [[UnitedStates/WelfarePolicy/HealthInsurancePlans|health insurance consumption in the U.S.]]; there are discontinuities at 26 and around retirement age. One solution is to treat age as a discrete variable (e.g., a dummy variable for each year, or some other small interval which captures all variation), but this may be an under-powered analysis. Designing discrete bins that capture more observations, while still capturing all variation, is the next step. ---- == Usage == === Modeling === Binning is a common strategy but not always the correct one. If the intention is to capture non-linearity, then a non-linear model (e.g. a log-linear model) is generally preferable to binning. Also, per [[RegressionAndOtherStories|Regression and Other Stories]] and related [[https://statmodeling.stat.columbia.edu/2024/06/19/what/|blog posts]]: "One thing that people don't always realize is that you can do binning and linear together, for example in [[R]], `y ~ z + age + age.65.74 + age75.84 + age.85.up`". ---- CategoryRicottone