Moments
Moments are measures of a distribution's shape and density.
Contents
Description
The first raw moment is the mean: μ = E[X]. For discrete variables, this is calculated as Σ x P(x=X); for continuous variables, as ∫ x f(x) dx
The second central moment is the variance: σ2 = E[(X - E[X])2] = E[(X - μ)2] = E(X2) - (E[X])2
The derivation of this for discrete variables is:
Σ (x - μ)2 P(x=X)
Σ (x2 - 2μx + μ2) P(x=X)
Σ [x2 P(x=X)] - 2μ Σ [x P(x=X)] + μ2 Σ [P(x=X)]
[E[X2]] - 2μ [μ] + μ2 [1]
E[X2] - 2μ2 + μ2
E[X2] - μ2
E[X2] - (E[X])2
The derivation of this for continuous variables is:
∫ (x - μ)2 f(x) dx
∫ (x2 - 2μx + μ2) f(x) dx
∫ [x2 f(x) dx] - 2μ ∫ [x f(x) dx] + μ2 ∫ [f(x) dx]
[E[X2]] - 2μ [μ] + μ2 [1]
E[X2] - 2μ2 + μ2
E[X2] - μ2
E[X2] - (E[X])2
Through these derivations, it can be easily proven that (1) constants added to a variable do not affect variance, and (2) constant multipliers applied to a variable scale variance by their square. This is succinctly summarized as Var(aX + b) = a2 Var(X)
The third central moment, skewness, measures lopsidedness of a distribution.
The fourth central moment, kurtosis, measures the heaviness of the tails on a distribution.
Errors
Models generally assume that individual errors average to zero, i.e. the first moment of errors is zero: E[Ŷ - Y] = 0. Nonetheless, higher order moments are important.
The mean square error (MSE) is the second moment of the error: MSE(ˆθ) = E[(ˆθ - E[ˆθ])2]. MSE can be decomposed into the variance of the estimator and bias: MSE(ˆθ) = Var(ˆθ) + Bias(ˆθ,θ)2 = Var(ˆθ) + (E[ˆθ]-θ)2.
Two important notes:
Bias, i.e. E[ˆθ] - θ, is not the same as the first moment of errors.
If there is no bias, then MSE is the variance of the estimator: MSE(ˆθ) = Var(ˆθ).