Differences between revisions 2 and 3
Revision 2 as of 2025-05-16 20:57:28
Size: 2897
Comment: Some attachments
Revision 3 as of 2025-05-16 21:22:54
Size: 4340
Comment: More content...
Deletions are marked like this. Additions are marked like this.
Line 69: Line 69:
Practically speaking, ''E['''uu'''^T^|'''X''']'' is never known. But if homoskedasticity and independence are assumed, i.e.: ''E['''uu'''^T^|'''X''']'' is not a practical matrix to work with. But if homoskedasticity and independence are assumed, i.e.:
Line 80: Line 80:

----



== Robust ==

In the presence of heteroskedasticity of errors, the above simplifications cannot apply. In the univariate case, use the original estimator.

This is mostly interesting in the multivariate case, where ''E['''uu'''^T^|'''X''']'' is still not practical. The assumptions made, when incorrect, lead to...
 * OLS estimators are not BLUE
   * they are unbiased, but no longer most efficient in terms of MSE
 * nonlinear GLMs, such as logit, can be biased
 * even if the model's estimates are unbiased, statistics derived from those estimates (e.g., conditioned probability distributions) can be biased

'''Eicker-Huber-White heterskedasticity consistent errors''' ('''HCE''') assume that errors are still independent but allowed to vary, i.e. '''''Σ''' = diag(ε,,1,,,...ε,,n,,)''. Importantly, this is not a function of '''''X''', so the standard errors can be estimated as:

\hat{Var}(\mathbf{b} | \mathbf{X}) = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \mathbf{\Sigma} \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}

Note however that heterskedasticity consistent errors are not always appropriate. To reiterate, for OLS, classical estimators are not biased even given heteroskedasticity; if the model changes with introduction of robust standard errors, there must be a specification error. Furthermore, heterskedasticity consistent errors are asymptotically unbiased; they can be biased for small ''n''.

Standard Errors

Standard errors are the standard deviations of estimated coefficients.


Description

In the classical OLS model,, estimated coefficients are:

  • univariate case: coef1.svg

  • multivariate case: coef2.svg

Standard errors are the standard deviations of these coefficients.


Classical

Univariate

In the univariate case, standard errors are classically specified as:

Var(\hat(\beta)|X_i) = \frac{\sum_{i=1}n Var((X_i-\bar{X})\hat{\epsilon}_i)}{(\sum_{i=1}n(X_i-\bar{X})2)2}

Supposing the population Var(β) is known, this can be simplified.

Var(\hat{\beta}|X_i) = \frac{Var(\beta)(\sum_{i=1}n(x_i-\bar{X})2)}{(\sum_{i=1}n(X_i-\bar{X})2)2} = \frac{Var(\beta)}{\sum_{i=1}n(X_i-\bar{X})^2}

Lastly, rewrite the denominator in terms of Var(X).

Var(\hat{\beta}|X_i) = \frac{Var(\beta)}{n (\frac{1}{n}\sum_{i=1}n(X_i-\bar{X})2)} = \frac{Var(\beta)}{n Var(X)}

Var(β) is unknown, so this term is estimated as:

\hat{\epsilon}_i = Y_i - \hat{Y}_i

Var(\hat{\epsilon}) = \frac{1}{n-1}\sum_{i=1}n(\hat{\epsilon}_i2)

1 degree of freedom is lost in assuming homoskedasticity of errors, i.e. homosked.svg

k degrees of freedom are lost in assuming independence of errors and k independent variables, which is necessarily 1 in the univariate case, i.e.:

\sum_{i=1}^nX_i\hat{\epsilon}_i = 0

This arrives at estimation as:

\hat{Var}(\hat{\beta}|X_i) = \frac{\frac{1}{n-2}Var(\hat{\epsilon})}{n Var(X)}

Multivariate

The classical multivariate specification is expressed in terms of (b-β), as:

Var(\mathbf{b} | \mathbf{X}) = E\Bigl[(\mathbf{b}-\mathbf{\beta})(\mathbf{b}-\mathbf{\beta})^T \Big| \mathbf{X}\Bigr]

That term is rewritten as (XTX)-1Xu.

Var(\mathbf{b} | \mathbf{X}) = E\Bigl[\bigl((\mathbf{X}T\mathbf{X}){-1}\mathbf{X}T\mathbf{u}\bigr)\bigl((\mathbf{X}T\mathbf{X}){-1}\mathbf{X}T\mathbf{u}\bigr){T} \Big| \mathbf{X}\Bigr] = E\Bigl[(\mathbf{X}T\mathbf{X}){-1}\mathbf{X}T\mathbf{u}\mathbf{u}T\mathbf{X}(\mathbf{X}T\mathbf{X})^{-1} \Big| \mathbf{X}\Bigr]

Var(\mathbf{b} | \mathbf{X}) = (\mathbf{X}T\mathbf{X}){-1}\mathbf{X}T E\bigl[\mathbf{u}\mathbf{u}T\big|\mathbf{X}\bigr]\mathbf{X}(\mathbf{X}T\mathbf{X}){-1}

E[uuT|X] is not a practical matrix to work with. But if homoskedasticity and independence are assumed, i.e.:

E\bigl[\mathbf{u}\mathbf{u}^T\big|\mathbf{X}\bigr] = Var(\mathbf{\beta})\mathbf{I}_n

...then this simplifies to:

Var(\mathbf{b} | \mathbf{X}) = Var(\mathbf{\beta}) (\mathbf{X}T\mathbf{X}){-1}

Var(β) is unknown, so the estimate is:

\hat{Var}(\mathbf{b} | \mathbf{X}) = \frac{1}{1-k} \mathbf{u}T\mathbf{u} (\mathbf{X}T\mathbf{X})^{-1}


Robust

In the presence of heteroskedasticity of errors, the above simplifications cannot apply. In the univariate case, use the original estimator.

This is mostly interesting in the multivariate case, where E[uuT|X] is still not practical. The assumptions made, when incorrect, lead to...

  • OLS estimators are not BLUE
    • they are unbiased, but no longer most efficient in terms of MSE
  • nonlinear GLMs, such as logit, can be biased
  • even if the model's estimates are unbiased, statistics derived from those estimates (e.g., conditioned probability distributions) can be biased

Eicker-Huber-White heterskedasticity consistent errors (HCE) assume that errors are still independent but allowed to vary, i.e. Σ = diag(ε1,...εn). Importantly, this is not a function of X, so the standard errors can be estimated as:

\hat{Var}(\mathbf{b} | \mathbf{X}) = (\mathbf{X}T\mathbf{X}){-1}\mathbf{X}T \mathbf{\Sigma} \mathbf{X}(\mathbf{X}T\mathbf{X})^{-1}

Note however that heterskedasticity consistent errors are not always appropriate. To reiterate, for OLS, classical estimators are not biased even given heteroskedasticity; if the model changes with introduction of robust standard errors, there must be a specification error. Furthermore, heterskedasticity consistent errors are asymptotically unbiased; they can be biased for small n.

Statistics/StandardErrors (last edited 2025-05-26 21:15:15 by DominicRicottone)