= Censored and Truncated Regression Models = A '''censored regression model''' is appropriate when the dependent variable is unavailable is above or below some threshold. A '''truncated regression model''' is appropriate when cases are systemically not collected/unreported when the dependent variable is above or below some threshold. The '''Tobit model''', named for [[EstimationOfRelationshipsForLimitedDependentVariables|Tobin (1958)]], is a special case of a censored regression model. <> ---- == Description == This is a modification of the [[Statistics/OrdinaryLeastSquares|OLS model]], where the dependent variable ''Y'' is related to the independent variable(s) ''X'' as ''Y,,i,, = bX,,i,, + U,,i,,''. === Univariate === Suppose that the variable of interest is unobserved if it is less than zero. The expected value is then expressed as ''E[Y,,i,,|X,,i,,,Y,,i,,≥0]''. Substituting ''Y,,i,,'' with the model equation yields ''E[bX,,i,, + U,,i,,|X,,i,,,bX,,i,, + U,,i,,≥0]'', and because the expectation is conditioned on a given ''X,,i,,'' this simplifies to ''bX,,i,, + E[U,,i,,|X,,i,,,bX,,i,, + U,,i,,≥0]''. Algebraically this is rewritten as: {{attachment:expectation1.svg}} where ''σ'' is the standard deviation of the error term ''U,,i,,''. The insertion of that standard deviation term transforms this into a formula that is easily decomposed into terms of the [[Statistics/NormalDistribution|p.d.f. and c.d.f. of the standard normal distribution]]. Altogether, the expected value is: {{attachment:expectation2.svg}} The '''hazard ratio''' or '''inverse [[Statistics/MillsRatio|Mills' ratio]]''' ('''IMR''') is notated as ''λ'' here. Sometimes ''λ'' evaluated for a given ''bX,,i,,/σ'' is notated as ''λ,,i,,''. Provided that the sample is censored (i.e., not truncated), it should be possible to estimate ''λ,,i,,'' using a [[Statistics/ProbitModel|probit model]]. This reveals that selection bias seen in the initial model can be treated as omitted variable bias, and can be corrected by using the model ''Y,,i,, = bX,,i,, + σλ,,i,, + V,,i,,''. === Bivariate === Suppose the variable of interest is unobserved if a second variable is less than zero, and suppose that these are drawn from a joint normal distribution. In other words, the model is specified as: * ''Y,,1i,, = bX,,i,, + U,,1i,,'' * ''Y,,2i,, = γZ,,i,, + U,,2i,,'' * ''X,,i,,'' and ''Z,,i,,'' can be the same, but often the system is only solvable when ''Z,,i,,'' has more predictors. Following the same procedures above, it can be demonstrated that: {{attachment:expectation3.svg}} {{attachment:expectation4.svg}} where ''λ,,i,,'' is specifically shorthand for ''λ'' evaluated for a given ''γZ,,i,,/√σ,,2,2,,''. The first equation is also sometimes rewritten in terms of the error correlation ''ρ = σ,,1,2,,/√(σ,,1,1,,σ,,2,2,,)'': ''bX,,i,, + ρ(√σ,,1,1,,)λ,,i,,''. Adding these omitted variables leads to a model specified as: * ''Y,,1i,, = bX,,i,, + (σ,,1,2,,/√σ,,2,2,,)λ,,i,, + V,,1i,,'' * ''Y,,2i,, = γZ,,i,, + (σ,,2,2,,/√σ,,2,2,,)λ,,i,, + V,,2i,,'' * ''E[V,,2,,^2^] = σ,,2,2,,(1 + φ,,i,,λ,,i,, - λ,,i,,^2^)'' * where ''φ,,i,,'' is shorthand for ''φ'' evaluated for a given ''γZ,,i,,/√σ,,2,2,,''. * as ''φ,,i,,'' goes to infinity (i.e., the chance of selection approaches 100%), this term approaches 0. * ''E[V,,1,,V,,2,,] = σ,,1,2,,(1 + φ,,i,,λ,,i,, - λ,,i,,^2^)'' * as ''φ,,i,,'' goes to infinity, this term approaches 0. * ''E[V,,1,,^2^] = σ,,1,1,,[(1 - ρ^2^) + ρ^2^(1 + φ,,i,,λ,,i,, - λ,,i,,^2^)]'' * as ''φ,,i,,'' goes to infinity, this term approaches ''σ,,1,1,,(1 - ρ^2^)''. ---- CategoryRicottone