Differences between revisions 19 and 29 (spanning 10 versions)

Ordinary Least Squares

Ordinary Least Squares (OLS) is a linear regression method, and is effectively synonymous with the linear regression model.

Contents

Ordinary Least Squares
1. Description
  1. Single Regression
  2. Multiple Regression
2. Estimated Coefficients

Description

A linear model is expressed as either (univariate) or (multivariate with k terms). Either way, a crucial assumption is that the expected value of the error term is 0, such that the first moment is E[y_i|x_i] = α + βx_i.

Single Regression

In the case of a single predictor, the OLS regression is:

This formulation leaves the components explicit: the y-intercept term is the mean outcome at x=0, and the slope term is marginal change to the outcome per a unit change in x.

The derivation can be seen here.

Multiple Regression

In the case of multiple predictors, the regression is fit like:

But conventionally, this OLS system is solved using linear algebra as:

Note that using a b here is intentional.

The derivation can be seen here.

Estimated Coefficients

The Gauss-Markov theorem demonstrates that (with some assumptions) the OLS estimations are the best linear unbiased estimators (BLUE) for the regression coefficients. The assumptions are:

Linearity
Exogeneity, i.e. predictors are independent of the outcome and the error term
Random sampling
No perfect multicolinearity
Homoskedasticity, i.e. error terms are constant across observations

#5 mostly comes into the estimation of standard errors, and there are alternative estimators that are robust to heteroskedasticity.

CategoryRicottone

Statistics/OrdinaryLeastSquares (last edited 2025-09-03 02:08:40 by DominicRicottone)

-  ⇤ ← Revision 19 as of 2024-06-05 22:01:56 → 
  Size: 1860
  Editor: DominicRicottone
  Comment: Rewrite 2
+   ← Revision 29 as of 2025-09-03 02:08:40 → ⇥
  Size: 2065
  Editor: DominicRicottone
  Comment: Apparently mutlivariate regression ~= multiple regression
-Deletions are marked like this.
+Additions are marked like this.
 Line 3:
-'''Ordinary Least Squares''' ('''OLS''') is a linear regression method. It minimizes root mean square errors.
+'''Ordinary Least Squares''' ('''OLS''') is a linear regression method, and is effectively synonymous with the '''linear regression model'''.
 Line 11:
-== Univariate ==
+== Description ==
 Line 13:
-Given one independent variable and one dependent (outcome) variable, the OLS model is specified as:
+A linear model is expressed as either {{attachment:model.svg}} (univariate) or {{attachment:mmodel.svg}} (multivariate with ''k'' terms). Either way, a crucial assumption is that the expected value of the error term is 0, such that the [[Statistics/Moments|first moment]] is ''E[y,,i,,|x,,i,,] = α + βx,,i,,''.
 Line 15:
-{{attachment:model.svg}}
-Line 17:
+Line 16:
-It is estimated as:
+=== Single Regression ===

In the case of a single predictor, the OLS regression is:
-Line 21:
+Line 23:
-This model describes (1) the mean observation and (2) the marginal changes to the outcome per unit changes in the independent variable.
+This formulation leaves the components explicit: the y-intercept term is the mean outcome at ''x=0'', and the slope term is marginal change to the outcome per a unit change in ''x''.
-Line 23:
+Line 25:
-The proof can be seen [[Econometrics/OrdinaryLeastSquares/UnivariateProof|here]].

----
+The derivation can be seen [[Statistics/OrdinaryLeastSquares/Single|here]].
 Line 29:
-== Multivariate ==
-Line 31:
+Line 30:
-Given ''k'' independent variables, the OLS model is specified as:
+=== Multiple Regression ===
-Line 33:
+Line 32:
-{{attachment:mmodel.svg}}

It is estimated as:
+In the case of multiple predictors, the regression is fit like:
-Line 38:
+Line 35:
+But conventionally, this OLS system is solved using [[LinearAlgebra|linear algebra]] as:

{{attachment:matrix.svg}}

Note that using a ''b'' here is [[Statistics/EconometricsNotation#Models|intentional]].

The derivation can be seen [[Statistics/OrdinaryLeastSquares/Multiple|here]].
-Line 45:
+Line 50:
-If these assumptions can be made:
+The '''Gauss-Markov theorem''' demonstrates that (with some assumptions) the OLS estimations are the '''best linear unbiased estimators''' ('''BLUE''') for the regression coefficients. The assumptions are:
-Line 48:
+Line 53:
-. [[Econometrics/Exogeneity|Exogeneity]]
+. Exogeneity, i.e. predictors are independent of the outcome and the error term
-Line 50:
+Line 55:
-. No perfect multicolinearity
 5. [[Econometrics/Homoskedasticity|Homoskedasticity]]
+. No perfect [[LinearAlgebra/Basis|multicolinearity]]
 5. Homoskedasticity, i.e. error terms are constant across observations
-Line 53:
+Line 58:
-Then OLS is the best linear unbiased estimator ('''BLUE''') for regression coefficients.

The variances for each coefficient are:

{{attachment:homo1.svg}}

Note that the standard deviation of the population's parameter is unknown, so it's estimated like:

{{attachment:homo2.svg}}

If the homoskedasticity assumption does not hold, then the estimators for each coefficient are actually:

{{attachment:hetero1.svg}}

Wherein, for example, ''r,,1j,,'' is the residual from regressing ''x,,1,,'' onto ''x,,2,,'', ... ''x,,k,,''.

The variances for each coefficient can be estimated with the Eicker-White formula:

{{attachment:hetero2.svg}}

See [[https://www.youtube.com/@kuminoff|Nicolai Kuminoff's]] video lectures for the derivation of the robust estimators.
+#5 mostly comes into the estimation of [[Statistics/StandardErrors|standard errors]], and there are alternative estimators that are robust to heteroskedasticity.

Diff for "Statistics/OrdinaryLeastSquares"

Ordinary Least Squares

Description

Single Regression

Multiple Regression

Estimated Coefficients