= Standard Errors =

'''Standard errors''' are the standard deviations of estimated coefficients.

<<TableOfContents>>

----



== Description ==

The standard error of some estimate is the [[Statistics/Variance|variance]] of that estimate divided by the square root of the sample size.

One common use of standard errors is to estimate [[Statistics/MarginOfError|margins of error]]. For a [[Statistics/BernoulliDistribution|Bernoulli-distributed]] variable, the standard error is ''p(1-p)'' and is maximized at ''p=0.5''. Therefore a conservative standard error is a function of only the sample size.

Standard errors are also used in interpreting the estimated coefficients of a regression model. As a reminder, by classical [[Statistics/OrdinaryLeastSquares|OLS]], estimated coefficients are:
 * univariate case: {{attachment:coef1.svg}}
 * multivariate case: {{attachment:coef2.svg}}

But specific regressions methods require assumptions about variance. Standard errors in this context are much more complicated.

----



== Classical ==

=== Univariate ===

In the univariate case, standard errors are classically specified as:

{{attachment:unispec1.svg}}

Supposing the population ''Var(ε)'' is known and errors are homoskedastic, i.e. they are constant across all cases, this can be simplified.

{{attachment:unispec2.svg}}

Lastly, rewrite the denominator in terms of ''Var(X)''.

{{attachment:unispec3.svg}}

''Var(ε)'' is unknown, so this term is estimated as:

{{attachment:uniest1.svg}}, {{attachment:uniest2.svg}}

1 degree of freedom is lost in assuming homoskedasticity of errors, i.e. {{attachment:homosked.svg}}; and ''k'' degrees of freedom are lost in assuming independence of errors and ''k'' independent variables, which is necessarily 1 in the univariate case, i.e.: {{attachment:ind.svg}}

This arrives at estimation as:

{{attachment:uniest3.svg}}



=== Multivariate ===

The classical multivariate specification is expressed in terms of ''('''b'''-β)'', as:

{{attachment:multspec1.svg}}

That term is rewritten as ''('''X'''^T^'''X''')^-1^'''Xε'''''.

{{attachment:multspec2.svg}}

{{attachment:multspec3.svg}}

''E['''εε'''^T^|'''X''']'' is not a practical matrix to work with, even if known. But if homoskedasticity and independence are assumed, i.e.: {{attachment:homosked_ind.svg}}, then this simplifies to:

{{attachment:multspec4.svg}}

''s^2^'' is unknown, so this term is estimated as:

{{attachment:multspec5.svg}}

This arrives at estimation as:

{{attachment:multspec6.svg}}

----



== Robust ==

In the presence of heteroskedasticity of errors, the above simplifications cannot apply. In the univariate case, use the original estimator.

This is mostly interesting in the multivariate case, where ''E['''εε'''^T^|'''X''']'' is still not practical. The assumptions made, when incorrect, lead to...
 * OLS estimators are not BLUE
   * they are unbiased, but no longer most efficient in terms of MSE
 * nonlinear GLMs, such as logit, can be biased
 * even if the model's estimates are unbiased, statistics derived from those estimates (e.g., conditioned probability distributions) can be biased

'''Eicker-Huber-White heterskedasticity consistent errors''' ('''HCE''') assume that errors are still independent but allowed to vary, i.e. '''''Σ''' = diag(ε,,1,,,...ε,,n,,)''. Importantly, this is not a function of '''''X''''', so the standard errors can be estimated as:

{{attachment:robust.svg}}

Robust errors are only appropriate with large sample sizes.

[[SamplingWeightsAndRegressionAnalysis|When fitting a model using data with survey weights, if those weights are a function of predictors including the dependent variable, then heteroskedastic consistent errors should be used.]]

[[HowRobustStandardErrorsExposeMethodologicalProblemsTheyDoNotFix|If a model significantly diverges after introducing robust errors, there is likely a specification error.]]

----



== Clustered ==

'''Liang-Zeger clustered robust standard errors''' assume that errors covary within clusters.

{{attachment:cluster1.svg}}

where '''''x''',,g,,'' is an ''n,,g,,'' by ''k'' matrix constructed by stacking '''''x''',,i,,'' for all ''i'' belonging to cluster ''g''; and '''''ε''',,g,,'' is an ''n,,g,,'' long vector holding the errors for each cluster ''g''.

The estimator becomes:

{{attachment:cluster2.svg}}

Clustered standard errors should only be used if the sample design or experimental design call for it.
 * A complex survey sample design leads to differential sampling errors across strata.
 * A two-stage sample design leads to differential sampling errors for the SSU within each PSU.
 * Assignment of an experimental treatment at a grouped level often leads to differential errors across those groups.
 * For time series evaluation of an experimental treatment that is assigned at the individual level, it is generally recommended to cluster at the individual level.

There are parallels between [[Statistics/FixedEffectsModel|fixed effects]] and clusters, but use of one does not mandate nor conflict with the other.

----



== Finite Population Correction ==

Most formulations of standard errors assume the population is unknown and/or infinite. If the population is finite and the sampling rate is high (above 5%), the standard error is too conservative. The '''finite population correction''' ('''FPC''') is an adjustment to correct this:

{{attachment:fpc.svg}}

Intuitively, the FPC is 0 when ''n = N'' because there is no sampling error in a census. FPC approaches 1 when ''n'' approaches 0, demonstrating that the factor is meaningless for low sampling rates.

Note that this formula (that is, specifying ''N-1'' in the denominator) is appropriate with samples, which is mostly how FPCs are applied anyway. A denominator of ''N'' alone ''is'' appropriate for populations. And note that given a sampling fraction ''f'' defined as ''n/N'', such an FPC can equivalently be expressed as ''√(1-f)''.



----
CategoryRicottone