Differences between revisions 2 and 4 (spanning 2 versions)

Censored and Truncated Regression Models

A censored regression model is appropriate when the dependent variable is unavailable is above or below some threshold.

A truncated regression model is appropriate when cases are systemically not collected/unreported when the dependent variable is above or below some threshold.

The Tobit model, named for Tobin (1958), is a special case of a censored regression model.

Contents

Censored and Truncated Regression Models
1. Description
  1. Univariate
  2. Bivariate

Description

This is a modification of the OLS model, where the dependent variable Y is related to the independent variable(s) X as Y_i = bX_i + U_i.

Univariate

Suppose that the variable of interest is unobserved if it is less than zero. The expected value is then expressed as E[Y_i|X_i,Y_i≥0]. Substituting Y_i with the model equation yields E[bX_i + U_i|X_i,bX_i + U_i≥0], and because the expectation is conditioned on a given X_i this simplifies to bX_i + E[U_i|X_i,bX_i + U_i≥0]. Algebraically this is rewritten as:

where σ is the standard deviation of the error term U_i. The insertion of that standard deviation term transforms this into a formula that is easily decomposed into terms of the p.d.f. and c.d.f. of the standard normal distribution. Altogether, the expected value is:

The hazard ratio or inverse Mills' ratio (IMR) is notated as λ here. Sometimes λ evaluated for a given bX_i/σ is notated as λ_i.

Provided that the sample is censored (i.e., not truncated), it should be possible to estimate λ_i using a probit model. This reveals that selection bias seen in the initial model can be treated as omitted variable bias, and can be corrected by using the model Y_i = bX_i + σλ_i + V_i.

Bivariate

Suppose the variable of interest is unobserved if a second variable is less than zero, and suppose that these are drawn from a joint normal distribution. In other words, the model is specified as:

Y_1i = bX_i + U_1i
Y_2i = γZ_i + U_2i
- X_i and Z_i can be the same, but often the system is only solvable when Z_i has more predictors.

Following the same procedures above, it can be demonstrated that:

where λ_i is specifically shorthand for λ evaluated for a given γZ_i/√σ_2,2.

Adding these omitted variables leads to a model specified as:

Y_1i = bX_i + (σ_1,2/√σ_2,2)λ_i + V_1i
Y_2i = γZ_i + (σ_2,2/√σ_2,2)λ_i + V_2i
E[V₂²] = σ_2,2(1 + φ_iλ_i - λ_i²)
- as φ_i goes to infinity (i.e., the chance of selection approaches 100%), this term approaches 0.
E[V₁V₂] = σ_1,2(1 + φ_iλ_i - λ_i²)
- as φ_i goes to infinity, this term approaches 0.
E[V₁²] = σ_1,1[(1 - ρ²) + ρ²(1 + φ_iλ_i - λ_i²)]
- as φ_i goes to infinity, this term approaches σ_1,1(1 - ρ²).

where φ_i is shorthand for φ evaluated for a given γZ_i/√σ_2,2; and ρ = σ_1,2/√(σ_1,1σ_2,2).

CategoryRicottone

Statistics/CensoredAndTruncatedRegressionModels (last edited 2026-02-17 15:27:04 by DominicRicottone)

-  ⇤ ← Revision 2 as of 2025-08-06 00:29:37 → 
  Size: 971
  Editor: DominicRicottone
  Comment: Some content
+   ← Revision 4 as of 2025-08-06 18:09:17 → ⇥
  Size: 3673
  Editor: DominicRicottone
  Comment: Notes
-Deletions are marked like this.
+Additions are marked like this.
 Line 19:
-Suppose that data is unobserved if the dependent variable is less than zero. The expected value is then expressed as ''E[Y,,i,,|X,,i,,,Y,,i,,≥0] = bX,,i,, + E[U,,i,,|Y,,i,,≥0]''.
+=== Univariate ===

Suppose that the variable of interest is unobserved if it is less than zero. The expected value is then expressed as ''E[Y,,i,,|X,,i,,,Y,,i,,≥0]''. Substituting ''Y,,i,,'' with the model equation yields ''E[bX,,i,, + U,,i,,|X,,i,,,bX,,i,, + U,,i,,≥0]'', and because the expectation is conditioned on a given ''X,,i,,'' this simplifies to ''bX,,i,, + E[U,,i,,|X,,i,,,bX,,i,, + U,,i,,≥0]''. Algebraically this is rewritten as:

{{attachment:expectation1.svg}}

where ''σ'' is the standard deviation of the error term ''U,,i,,''. The insertion of that standard deviation term transforms this into a formula that is easily decomposed into terms of the [[Statistics/NormalDistribution|p.d.f. and c.d.f. of the standard normal distribution]]. Altogether, the expected value is:

{{attachment:expectation2.svg}}

The '''hazard ratio''' or '''inverse [[Statistics/MillsRatio|Mills' ratio]]''' ('''IMR''') is notated as ''λ'' here. Sometimes ''λ'' evaluated for a given ''bX,,i,,/σ'' is notated as ''λ,,i,,''.

Provided that the sample is censored (i.e., not truncated), it should be possible to estimate ''λ,,i,,'' using a [[Statistics/ProbitModel|probit model]]. This reveals that selection bias seen in the initial model can be treated as omitted variable bias, and can be corrected by using the model ''Y,,i,, = bX,,i,, + σλ,,i,, + V,,i,,''.



=== Bivariate ===

Suppose the variable of interest is unobserved if a second variable is less than zero, and suppose that these are drawn from a joint normal distribution. In other words, the model is specified as:
 * ''Y,,1i,, = bX,,i,, + U,,1i,,''
 * ''Y,,2i,, = γZ,,i,, + U,,2i,,''
   * ''X,,i,,'' and ''Z,,i,,'' can be the same, but often the system is only solvable when ''Z,,i,,'' has more predictors.

Following the same procedures above, it can be demonstrated that:

{{attachment:expectation3.svg}}

{{attachment:expectation4.svg}}

where ''λ,,i,,'' is specifically shorthand for ''λ'' evaluated for a given ''γZ,,i,,/√σ,,2,2,,''.

Adding these omitted variables leads to a model specified as:
 * ''Y,,1i,, = bX,,i,, + (σ,,1,2,,/√σ,,2,2,,)λ,,i,, + V,,1i,,''
 * ''Y,,2i,, = γZ,,i,, + (σ,,2,2,,/√σ,,2,2,,)λ,,i,, + V,,2i,,''
 * ''E[V,,2,,^2^] = σ,,2,2,,(1 + φ,,i,,λ,,i,, - λ,,i,,^2^)''
   * as ''φ,,i,,'' goes to infinity (i.e., the chance of selection approaches 100%), this term approaches 0.
 * ''E[V,,1,,V,,2,,] = σ,,1,2,,(1 + φ,,i,,λ,,i,, - λ,,i,,^2^)''
   * as ''φ,,i,,'' goes to infinity, this term approaches 0.
 * ''E[V,,1,,^2^] = σ,,1,1,,[(1 - ρ^2^) + ρ^2^(1 + φ,,i,,λ,,i,, - λ,,i,,^2^)]''
   * as ''φ,,i,,'' goes to infinity, this term approaches ''σ,,1,1,,(1 - ρ^2^)''.

where ''φ,,i,,'' is shorthand for ''φ'' evaluated for a given ''γZ,,i,,/√σ,,2,2,,''; and ''ρ = σ,,1,2,,/√(σ,,1,1,,σ,,2,2,,)''.

Diff for "Statistics/CensoredAndTruncatedRegressionModels"

Censored and Truncated Regression Models

Description

Univariate

Bivariate