Differences between revisions 2 and 13 (spanning 11 versions)
Revision 2 as of 2025-05-16 20:57:28
Size: 2897
Comment: Some attachments
Revision 13 as of 2025-05-22 14:37:11
Size: 4568
Comment: Clustered
Deletions are marked like this. Additions are marked like this.
Line 29: Line 29:
Var(\hat(\beta)|X_i) = \frac{\sum_{i=1}^n Var((X_i-\bar{X})\hat{\epsilon}_i)}{(\sum_{i=1}^n(X_i-\bar{X})^2)^2} {{attachment:unispec1.svg}}
Line 31: Line 31:
Supposing the population ''Var(β)'' is known, this can be simplified. Supposing the population ''Var(ε)'' is known and errors are homoskedastic, i.e. they are constant across all cases, this can be simplified.
Line 33: Line 33:
Var(\hat{\beta}|X_i) = \frac{Var(\beta)(\sum_{i=1}^n(x_i-\bar{X})^2)}{(\sum_{i=1}^n(X_i-\bar{X})^2)^2} = \frac{Var(\beta)}{\sum_{i=1}^n(X_i-\bar{X})^2} {{attachment:unispec2.svg}}
Line 37: Line 37:
Var(\hat{\beta}|X_i) = \frac{Var(\beta)}{n (\frac{1}{n}\sum_{i=1}^n(X_i-\bar{X})^2)} = \frac{Var(\beta)}{n Var(X)} {{attachment:unispec3.svg}}
Line 39: Line 39:
''Var(β)'' is unknown, so this term is estimated as: ''Var(ε)'' is unknown, so this term is estimated as:
Line 41: Line 41:
\hat{\epsilon}_i = Y_i - \hat{Y}_i {{attachment:uniest1.svg}}, {{attachment:uniest2.svg}}
Line 43: Line 43:
Var(\hat{\epsilon}) = \frac{1}{n-1}\sum_{i=1}^n(\hat{\epsilon}_i^2)

1 degree of freedom is lost in assuming homoskedasticity of errors, i.e. {{attachment:homosked.svg}}

''k'' degrees of freedom are lost in assuming independence of errors and ''k'' independent variables, which is necessarily 1 in the univariate case, i.e.:

\sum_{i=1}^nX_i\hat{\epsilon}_i = 0
1 degree of freedom is lost in assuming homoskedasticity of errors, i.e. {{attachment:homosked.svg}}; and ''k'' degrees of freedom are lost in assuming independence of errors and ''k'' independent variables, which is necessarily 1 in the univariate case, i.e.: {{attachment:ind.svg}}
Line 53: Line 47:
\hat{Var}(\hat{\beta}|X_i) = \frac{\frac{1}{n-2}Var(\hat{\epsilon})}{n Var(X)} {{attachment:uniest3.svg}}
Line 61: Line 55:
Var(\mathbf{b} | \mathbf{X}) = E\Bigl[(\mathbf{b}-\mathbf{\beta})(\mathbf{b}-\mathbf{\beta})^T \Big| \mathbf{X}\Bigr] {{attachment:multspec1.svg}}
Line 63: Line 57:
That term is rewritten as ''('''X'''^T^'''X''')^-1^'''X'''u''. That term is rewritten as ''('''X'''^T^'''X''')^-1^'''Xε'''''.
Line 65: Line 59:
Var(\mathbf{b} | \mathbf{X}) = E\Bigl[\bigl((\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{u}\bigr)\bigl((\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{u}\bigr)^{T} \Big| \mathbf{X}\Bigr] = E\Bigl[(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{u}\mathbf{u}^T\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1} \Big| \mathbf{X}\Bigr] {{attachment:multspec2.svg}}
Line 67: Line 61:
Var(\mathbf{b} | \mathbf{X}) = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T E\bigl[\mathbf{u}\mathbf{u}^T\big|\mathbf{X}\bigr]\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1} {{attachment:multspec3.svg}}
Line 69: Line 63:
Practically speaking, ''E['''uu'''^T^|'''X''']'' is never known. But if homoskedasticity and independence are assumed, i.e.: ''E['''εε'''^T^|'''X''']'' is not a practical matrix to work with, even if known. But if homoskedasticity and independence are assumed, i.e.: {{attachment:homosked_ind.svg}}, then this simplifies to:
Line 71: Line 65:
E\bigl[\mathbf{u}\mathbf{u}^T\big|\mathbf{X}\bigr] = Var(\mathbf{\beta})\mathbf{I}_n {{attachment:multspec4.svg}}
Line 73: Line 67:
...then this simplifies to: ''s^2^'' is unknown, so this term is estimated as:
Line 75: Line 69:
Var(\mathbf{b} | \mathbf{X}) = Var(\mathbf{\beta}) (\mathbf{X}^T\mathbf{X})^{-1} {{attachment:multspec5.svg}}
Line 77: Line 71:
''Var(β)'' is unknown, so the estimate is: This arrives at estimation as:
Line 79: Line 73:
\hat{Var}(\mathbf{b} | \mathbf{X}) = \frac{1}{1-k} \mathbf{u}^T\mathbf{u} (\mathbf{X}^T\mathbf{X})^{-1} {{attachment:multspec6.svg}}

----



== Robust ==

In the presence of heteroskedasticity of errors, the above simplifications cannot apply. In the univariate case, use the original estimator.

This is mostly interesting in the multivariate case, where ''E['''εε'''^T^|'''X''']'' is still not practical. The assumptions made, when incorrect, lead to...
 * OLS estimators are not BLUE
   * they are unbiased, but no longer most efficient in terms of MSE
 * nonlinear GLMs, such as logit, can be biased
 * even if the model's estimates are unbiased, statistics derived from those estimates (e.g., conditioned probability distributions) can be biased

'''Eicker-Huber-White heterskedasticity consistent errors''' ('''HCE''') assume that errors are still independent but allowed to vary, i.e. '''''Σ''' = diag(ε,,1,,,...ε,,n,,)''. Importantly, this is not a function of '''''X''''', so the standard errors can be estimated as:

{{attachment:robust.svg}}

Robust errors are only appropriate with large sample sizes.

[[SamplingWeightsAndRegressionAnalysis|When fitting a model using data with survey weights, if those weights are a function of predictors including the dependent variable, then heteroskedastic consistent errors should be used.]]

[[HowRobustStandardErrorsExposeMethodologicalProblemsTheyDoNotFix|If a model significantly diverges after introducing robust errors, there is likely a specification error.]]

----



== Clustered ==

'''Liang-Zeger clustered robust standard errors''' assume that errors covary within clusters.

{{attachment:cluster1.svg}}

where '''''x''',,g,,'' is an ''n,,g,,'' by ''k'' matrix constructed by stacking '''''x''',,i,,'' for all ''i'' belonging to cluster ''g''; and '''''ε''',,g,,'' is an ''n,,g,,'' long vector holding the errors for each cluster ''g''.

The estimator becomes:

{{attachment:cluster2.svg}}

Clustered standard errors should only be used if the sample design or experimental design call for it.
 * A complex survey sample design leads to differential sampling errors across strata.
 * A two-stage sample design leads to differential sampling errors for the SSU within each PSU.
 * Assignment of an experimental treatment at a grouped level often leads to differential errors across those groups.
 * For time series evaluation of an experimental treatment that is assigned at the individual level, it is generally recommended to cluster at the individual level.

There are parallels between [[Statistics/FixedEffectsModel|fixed effects]] and clusters, but use of one does not mandate nor conflict with the other.



----
CategoryRicottone

Standard Errors

Standard errors are the standard deviations of estimated coefficients.


Description

In the classical OLS model,, estimated coefficients are:

  • univariate case: coef1.svg

  • multivariate case: coef2.svg

Standard errors are the standard deviations of these coefficients.


Classical

Univariate

In the univariate case, standard errors are classically specified as:

unispec1.svg

Supposing the population Var(ε) is known and errors are homoskedastic, i.e. they are constant across all cases, this can be simplified.

unispec2.svg

Lastly, rewrite the denominator in terms of Var(X).

unispec3.svg

Var(ε) is unknown, so this term is estimated as:

uniest1.svg, uniest2.svg

1 degree of freedom is lost in assuming homoskedasticity of errors, i.e. homosked.svg; and k degrees of freedom are lost in assuming independence of errors and k independent variables, which is necessarily 1 in the univariate case, i.e.: ind.svg

This arrives at estimation as:

uniest3.svg

Multivariate

The classical multivariate specification is expressed in terms of (b-β), as:

multspec1.svg

That term is rewritten as (XTX)-1.

multspec2.svg

multspec3.svg

E[εεT|X] is not a practical matrix to work with, even if known. But if homoskedasticity and independence are assumed, i.e.: homosked_ind.svg, then this simplifies to:

multspec4.svg

s2 is unknown, so this term is estimated as:

multspec5.svg

This arrives at estimation as:

multspec6.svg


Robust

In the presence of heteroskedasticity of errors, the above simplifications cannot apply. In the univariate case, use the original estimator.

This is mostly interesting in the multivariate case, where E[εεT|X] is still not practical. The assumptions made, when incorrect, lead to...

  • OLS estimators are not BLUE
    • they are unbiased, but no longer most efficient in terms of MSE
  • nonlinear GLMs, such as logit, can be biased
  • even if the model's estimates are unbiased, statistics derived from those estimates (e.g., conditioned probability distributions) can be biased

Eicker-Huber-White heterskedasticity consistent errors (HCE) assume that errors are still independent but allowed to vary, i.e. Σ = diag(ε1,...εn). Importantly, this is not a function of X, so the standard errors can be estimated as:

robust.svg

Robust errors are only appropriate with large sample sizes.

When fitting a model using data with survey weights, if those weights are a function of predictors including the dependent variable, then heteroskedastic consistent errors should be used.

If a model significantly diverges after introducing robust errors, there is likely a specification error.


Clustered

Liang-Zeger clustered robust standard errors assume that errors covary within clusters.

cluster1.svg

where xg is an ng by k matrix constructed by stacking xi for all i belonging to cluster g; and εg is an ng long vector holding the errors for each cluster g.

The estimator becomes:

cluster2.svg

Clustered standard errors should only be used if the sample design or experimental design call for it.

  • A complex survey sample design leads to differential sampling errors across strata.
  • A two-stage sample design leads to differential sampling errors for the SSU within each PSU.
  • Assignment of an experimental treatment at a grouped level often leads to differential errors across those groups.
  • For time series evaluation of an experimental treatment that is assigned at the individual level, it is generally recommended to cluster at the individual level.

There are parallels between fixed effects and clusters, but use of one does not mandate nor conflict with the other.


CategoryRicottone

Statistics/StandardErrors (last edited 2025-05-26 21:15:15 by DominicRicottone)