Differences between revisions 1 and 27 (spanning 26 versions)

Ordinary Least Squares

Ordinary Least Squares (OLS) is a linear regression method, and is effectively synonymous with the linear regression model.

Contents

Ordinary Least Squares

Description

A linear model is expressed as either (univariate) or (multivariate with k terms). Either way, a crucial assumption is that the expected value of the error term is 0, such that the first moment is E[y_i|x_i] = α + βx_i.

Univariate

In the univariate case, the OLS regression is:

This formulation leaves the components explicit: the y-intercept term is the mean outcome at x=0, and the slope term is marginal change to the outcome per a unit change in x.

The derivation can be seen here.

Multivariate

In the multivariate case, the regression is fit like:

But conventionally, multivariate OLS is solved using linear algebra as:

Note that using a b here is intentional.

The derivation can be seen here.

Estimated Coefficients

The Gauss-Markov theorem demonstrates that (with some assumptions) the OLS estimations are the best linear unbiased estimators (BLUE) for the regression coefficients. The assumptions are:

Linearity
Exogeneity, i.e. predictors are independent of the outcome and the error term
Random sampling
No perfect multicolinearity
Homoskedasticity, i.e. error terms are constant across observations

#5 mostly comes into the estimation of standard errors, and there are alternative estimators that are robust to heteroskedasticity.

CategoryRicottone

Statistics/OrdinaryLeastSquares (last edited 2025-09-03 02:08:40 by DominicRicottone)

-  ⇤ ← Revision 1 as of 2023-10-28 05:18:15 → 
  Size: 1390
  Editor: DominicRicottone
  Comment:
+   ← Revision 27 as of 2025-08-06 00:56:18 → ⇥
  Size: 2037
  Editor: DominicRicottone
  Comment: Simplifications
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-= Linear Regression =
+= Ordinary Least Squares =
 Line 3:
-A linear regression expresses the linear relation of a treatment variable to an outcome variable.
+'''Ordinary Least Squares''' ('''OLS''') is a linear regression method, and is effectively synonymous with the '''linear regression model'''.
 Line 11:
-== Regression Line ==
+== Description ==
 Line 13:
-A regression line can be especially useful on a scatter plot.
+A linear model is expressed as either {{attachment:model.svg}} (univariate) or {{attachment:mmodel.svg}} (multivariate with ''k'' terms). Either way, a crucial assumption is that the expected value of the error term is 0, such that the [[Statistics/Moments|first moment]] is ''E[y,,i,,|x,,i,,] = α + βx,,i,,''.
 Line 15:
-The regression line passes through two points:
-Line 17:
+Line 16:
-{{attachment:regression1.svg}}
-Line 19:
+Line 17:
-and
+=== Univariate ===
-Line 21:
+Line 19:
-{{attachment:regression2.svg}}
+In the univariate case, the OLS regression is:

{{attachment:estimate.svg}}

This formulation leaves the components explicit: the y-intercept term is the mean outcome at ''x=0'', and the slope term is marginal change to the outcome per a unit change in ''x''. 

The derivation can be seen [[Statistics/OrdinaryLeastSquares/Univariate|here]].




== Multivariate ==

In the multivariate case, the regression is fit like:

{{attachment:mestimate.svg}}

But conventionally, multivariate OLS is solved using [[LinearAlgebra|linear algebra]] as:

{{attachment:matrix.svg}}

Note that using a ''b'' here is [[Statistics/EconometricsNotation#Models|intentional]].

The derivation can be seen [[Statistics/OrdinaryLeastSquares/Multivariate|here]].
-Line 27:
+Line 48:
-== Regression Computation ==
+== Estimated Coefficients ==
-Line 29:
+Line 50:
-Take the generic equation form of a line:
+The '''Gauss-Markov theorem''' demonstrates that (with some assumptions) the OLS estimations are the '''best linear unbiased estimators''' ('''BLUE''') for the regression coefficients. The assumptions are:
-Line 31:
+Line 52:
-{{attachment:b01.svg}}
+. Linearity
 2. Exogeneity, i.e. predictors are independent of the outcome and the error term
 3. Random sampling
 4. No perfect [[LinearAlgebra/Basis|multicolinearity]]
 5. Homoskedasticity, i.e. error terms are constant across observations
-Line 33:
+Line 58:
-Insert the first point into this form.

{{attachment:b02.svg}}

This can be trivially rewritten to solve for ''a'' in terms of ''b'':

{{attachment:b03.svg}}

Insert the second point into the original form.

{{attachment:b04.svg}}

Now additionally insert the solution for ''a'' in terms of ''b''.

{{attachment:b05.svg}}

Expand all terms to produce:

{{attachment:b06.svg}}

This can now be eliminated into:

{{attachment:b07.svg}}

Giving a solution for ''b'':

{{attachment:b08.svg}}

This solution is trivially rewritten as:

{{attachment:b09.svg}}

Expand the formula for correlation as:

{{attachment:b10.svg}}

This can now be eliminated into:

{{attachment:b11.svg}}

Finally, ''b'' can be eloquently written as:

{{attachment:b12.svg}}

Giving a generic formula for the regression line:

{{attachment:b13.svg}}
+#5 mostly comes into the estimation of [[Statistics/StandardErrors|standard errors]], and there are alternative estimators that are robust to heteroskedasticity.

Diff for "Statistics/OrdinaryLeastSquares"

Ordinary Least Squares

Description

Univariate

Multivariate

Estimated Coefficients