Differences between revisions 1 and 18 (spanning 17 versions)

Ordinary Least Squares

Ordinary Least Squares (OLS) is a linear regression method. It minimizes root mean square errors.

Contents

Ordinary Least Squares

Univariate

Given one independent variable and one dependent (outcome) variable, the OLS model is specified as:

It is estimated as:

This model describes (1) the mean observation and (2) the marginal changes to the outcome per unit changes in the independent variable.

The proof can be seen here.

Multivariate

Linear Model

The linear model can be expressed as:

If these assumptions can be made:

Linearity
Exogeneity
Random sampling
No perfect multicolinearity
Homoskedasticity

Then OLS is the best linear unbiased estimator (BLUE) for these coefficients.

Using the computation above, the coefficients are estimated to produce:

The variances for each coefficient are:

Note that the standard deviation of the population's parameter is unknown, so it's estimated like:

If the homoskedasticity assumption does not hold, then the estimators for each coefficient are actually:

Wherein, for example, r_1j is the residual from regressing x₁ onto x₂, ... x_k.

The variances for each coefficient can be estimated with the Eicker-White formula:

See Nicolai Kuminoff's video lectures for the derivation of the robust estimators.

CategoryRicottone

Statistics/OrdinaryLeastSquares (last edited 2025-09-03 02:08:40 by DominicRicottone)

-  ⇤ ← Revision 1 as of 2023-10-28 05:18:15 → 
  Size: 1390
  Editor: DominicRicottone
  Comment:
+   ← Revision 18 as of 2024-06-05 21:29:24 → ⇥
  Size: 1866
  Editor: DominicRicottone
  Comment: Rewrite
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-= Linear Regression =
+= Ordinary Least Squares =
 Line 3:
-A linear regression expresses the linear relation of a treatment variable to an outcome variable.
+'''Ordinary Least Squares''' ('''OLS''') is a linear regression method. It minimizes root mean square errors.
 Line 11:
-== Regression Line ==
+== Univariate ==
 Line 13:
-A regression line can be especially useful on a scatter plot.
+Given one independent variable and one dependent (outcome) variable, the OLS model is specified as:
 Line 15:
-The regression line passes through two points:
+{{attachment:model.svg}}
 Line 17:
-{{attachment:regression1.svg}}
+It is estimated as:
 Line 19:
-and
+{{attachment:estimate.svg}}
 Line 21:
-{{attachment:regression2.svg}}
+This model describes (1) the mean observation and (2) the marginal changes to the outcome per unit changes in the independent variable. 

The proof can be seen [[Econometrics/OrdinaryLeastSquares/UnivariateProof|here]].
-Line 27:
+Line 29:
-== Regression Computation ==
+== Multivariate ==
-Line 29:
+Line 31:
-Take the generic equation form of a line:
+----
-Line 31:
+Line 33:
-{{attachment:b01.svg}}
-Line 33:
+Line 34:
-Insert the first point into this form.
 Line 35:
-{{attachment:b02.svg}}
+== Linear Model ==
 Line 37:
-This can be trivially rewritten to solve for ''a'' in terms of ''b'':
+The linear model can be expressed as:
 Line 39:
-{{attachment:b03.svg}}
+{{attachment:model1.svg}}
 Line 41:
-Insert the second point into the original form.
+If these assumptions can be made:
 Line 43:
-{{attachment:b04.svg}}
+. Linearity
 2. [[Econometrics/Exogeneity|Exogeneity]]
 3. Random sampling
 4. No perfect multicolinearity
 5. [[Econometrics/Homoskedasticity|Homoskedasticity]]
-Line 45:
+Line 49:
-Now additionally insert the solution for ''a'' in terms of ''b''.
+Then OLS is the best linear unbiased estimator ('''BLUE''') for these coefficients.
-Line 47:
+Line 51:
-{{attachment:b05.svg}}
+Using the computation above, the coefficients are estimated to produce:
-Line 49:
+Line 53:
-Expand all terms to produce:
+{{attachment:model2.svg}}
-Line 51:
+Line 55:
-{{attachment:b06.svg}}
+The variances for each coefficient are:
-Line 53:
+Line 57:
-This can now be eliminated into:
+{{attachment:homo1.svg}}
-Line 55:
+Line 59:
-{{attachment:b07.svg}}
+Note that the standard deviation of the population's parameter is unknown, so it's estimated like:
-Line 57:
+Line 61:
-Giving a solution for ''b'':
+{{attachment:homo2.svg}}
-Line 59:
+Line 63:
-{{attachment:b08.svg}}
+If the homoskedasticity assumption does not hold, then the estimators for each coefficient are actually:
-Line 61:
+Line 65:
-This solution is trivially rewritten as:
+{{attachment:hetero1.svg}}
-Line 63:
+Line 67:
-{{attachment:b09.svg}}
+Wherein, for example, ''r,,1j,,'' is the residual from regressing ''x,,1,,'' onto ''x,,2,,'', ... ''x,,k,,''.
-Line 65:
+Line 69:
-Expand the formula for correlation as:
+The variances for each coefficient can be estimated with the Eicker-White formula:
-Line 67:
+Line 71:
-{{attachment:b10.svg}}
+{{attachment:hetero2.svg}}
-Line 69:
+Line 73:
-This can now be eliminated into:

{{attachment:b11.svg}}

Finally, ''b'' can be eloquently written as:

{{attachment:b12.svg}}

Giving a generic formula for the regression line:

{{attachment:b13.svg}}
+See [[https://www.youtube.com/@kuminoff|Nicolai Kuminoff's]] video lectures for the derivation of the robust estimators.

Diff for "Statistics/OrdinaryLeastSquares"

Ordinary Least Squares

Univariate

Multivariate

Linear Model