Differences between revisions 2 and 4 (spanning 2 versions)

Binning

Binning is a pre-processing technique.

Contents

Binning
1. Description
2. Usage
  1. Modeling

Description

Continuous variables can be inconvenient as predictors when the expected relationship is non-linear. Consider age and health insurance consumption in the U.S.; there are discontinuities at 26 and around retirement age.

One solution is to treat age as a discrete variable (e.g., a dummy variable for each year, or some other small interval which captures all variation), but this may be an under-powered analysis. Designing discrete bins that capture more observations, while still capturing all variation, is the next step.

Usage

Modeling

Binning is a common strategy but not always the correct one.

If the intention is to capture non-linearity, then a non-linear model (e.g. a log-linear model) is generally preferable to binning.

Also, per Regression and Other Stories and related blog posts: "One thing that people don't always realize is that you can do binning and linear together, for example in R, y ~ z + age + age.65.74 + age75.84 + age.85.up".

CategoryRicottone

-  ⇤ ← Revision 2 as of 2025-01-10 14:19:09 → 
  Size: 992
  Editor: DominicRicottone
  Comment: Killing Econometrics page
+   ← Revision 4 as of 2025-11-10 15:01:14 → ⇥
  Size: 1301
  Editor: DominicRicottone
  Comment: Rewrite
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-## page was renamed from Econometrics/Binning
-Line 4:
+Line 3:
-'''Binning''' is the process of transforming a continuous variable into either a categorical variable or a set of discrete indicator variables.
+'''Binning''' is a pre-processing technique.
-Line 7:
+Line 6:
+----



== Description ==

Continuous variables can be inconvenient as predictors when the expected relationship is non-linear. Consider age and [[UnitedStates/WelfarePolicy/HealthInsurancePlans|health insurance consumption in the U.S.]]; there are discontinuities at 26 and around retirement age.

One solution is to treat age as a discrete variable (e.g., a dummy variable for each year, or some other small interval which captures all variation), but this may be an under-powered analysis. Designing discrete bins that capture more observations, while still capturing all variation, is the next step.
-Line 16:
+Line 25:
-=== In Regression Models ===
+=== Modeling ===
-Line 18:
+Line 27:
-There is significant discussion surrounding the use of binning in regression models.
+Binning is a common strategy but not always the correct one.
-Line 20:
+Line 29:
-If the intention of binning is to capture non-linearity, then a non-linear model is generally preferable. Another common solution to that problem is to regress on a variable and the square of that variable.
+If the intention is to capture non-linearity, then a non-linear model (e.g. a log-linear model) is generally preferable to binning.
-Line 22:
+Line 31:
-Gelman discusses the approach partly in [[RegressionAndOtherStories|Regression and Other Stories]] and in [[https://statmodeling.stat.columbia.edu/2024/06/19/what/|blog posts]]: "One thing that people don't always realize is that you can do binning and linear together, for example in [[R]], `y ~ z + age + age.65.74 + age75.84 + age.85.up`" [link and styling mine].
+Also, per [[RegressionAndOtherStories|Regression and Other Stories]] and related [[https://statmodeling.stat.columbia.edu/2024/06/19/what/|blog posts]]: "One thing that people don't always realize is that you can do binning and linear together, for example in [[R]], `y ~ z + age + age.65.74 + age75.84 + age.85.up`".

Diff for "Statistics/Binning"

Binning

Description

Usage

Modeling