Differences between revisions 1 and 25 (spanning 24 versions)
Revision 1 as of 2023-10-28 05:18:15
Size: 1390
Comment:
Revision 25 as of 2025-05-17 03:47:01
Size: 1571
Comment: Remove links
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Linear Regression = = Ordinary Least Squares =
Line 3: Line 3:
A linear regression expresses the linear relation of a treatment variable to an outcome variable. '''Ordinary Least Squares''' ('''OLS''') is a linear regression method. It minimizes root mean square errors.
Line 11: Line 11:
== Regression Line == == Univariate ==
Line 13: Line 13:
A regression line can be especially useful on a scatter plot. Given one independent variable and one dependent (outcome) variable, the OLS model is specified as:
Line 15: Line 15:
The regression line passes through two points: {{attachment:model.svg}}
Line 17: Line 17:
{{attachment:regression1.svg}} It is estimated as:
Line 19: Line 19:
and {{attachment:estimate.svg}}
Line 21: Line 21:
{{attachment:regression2.svg}} This model describes (1) the mean observation and (2) the marginal changes to the outcome per unit changes in the independent variable.

The derivation can be seen [[Statistics/OrdinaryLeastSquares/Univariate|here]].
Line 27: Line 29:
== Regression Computation == == Multivariate ==
Line 29: Line 31:
Take the generic equation form of a line: Given ''k'' independent variables, the OLS model is specified as:
Line 31: Line 33:
{{attachment:b01.svg}} {{attachment:mmodel.svg}}
Line 33: Line 35:
Insert the first point into this form. It is estimated as:
Line 35: Line 37:
{{attachment:b02.svg}} {{attachment:mestimate.svg}}
Line 37: Line 39:
This can be trivially rewritten to solve for ''a'' in terms of ''b'': More conventionally, this is estimated with [[LinearAlgebra|linear algebra]] as:
Line 39: Line 41:
{{attachment:b03.svg}} {{attachment:matrix.svg}}
Line 41: Line 43:
Insert the second point into the original form. The derivation can be seen [[Statistics/OrdinaryLeastSquares/Multivariate|here]].
Line 43: Line 45:
{{attachment:b04.svg}} ----
Line 45: Line 47:
Now additionally insert the solution for ''a'' in terms of ''b''.
Line 47: Line 48:
{{attachment:b05.svg}}
Line 49: Line 49:
Expand all terms to produce: == Estimated Coefficients ==
Line 51: Line 51:
{{attachment:b06.svg}} The '''Gauss-Markov theorem''' demonstrates that (with some assumptions) the OLS estimations are the '''best linear unbiased estimators''' ('''BLUE''') for the regression coefficients. The assumptions are:
Line 53: Line 53:
This can now be eliminated into:  1. Linearity
 2. Exogeneity
 3. Random sampling
 4. No perfect [[LinearAlgebra/Basis|multicolinearity]]
 5. Homoskedasticity
Line 55: Line 59:
{{attachment:b07.svg}}

Giving a solution for ''b'':

{{attachment:b08.svg}}

This solution is trivially rewritten as:

{{attachment:b09.svg}}

Expand the formula for correlation as:

{{attachment:b10.svg}}

This can now be eliminated into:

{{attachment:b11.svg}}

Finally, ''b'' can be eloquently written as:

{{attachment:b12.svg}}

Giving a generic formula for the regression line:

{{attachment:b13.svg}}
#5 mostly comes into the estimation of [[Statistics/StandardErrors|standard errors]], and there are alternative estimators that are robust to heteroskedasticity.

Ordinary Least Squares

Ordinary Least Squares (OLS) is a linear regression method. It minimizes root mean square errors.


Univariate

Given one independent variable and one dependent (outcome) variable, the OLS model is specified as:

model.svg

It is estimated as:

estimate.svg

This model describes (1) the mean observation and (2) the marginal changes to the outcome per unit changes in the independent variable.

The derivation can be seen here.


Multivariate

Given k independent variables, the OLS model is specified as:

mmodel.svg

It is estimated as:

mestimate.svg

More conventionally, this is estimated with linear algebra as:

matrix.svg

The derivation can be seen here.


Estimated Coefficients

The Gauss-Markov theorem demonstrates that (with some assumptions) the OLS estimations are the best linear unbiased estimators (BLUE) for the regression coefficients. The assumptions are:

  1. Linearity
  2. Exogeneity
  3. Random sampling
  4. No perfect multicolinearity

  5. Homoskedasticity

#5 mostly comes into the estimation of standard errors, and there are alternative estimators that are robust to heteroskedasticity.


CategoryRicottone

Statistics/OrdinaryLeastSquares (last edited 2025-09-03 02:08:40 by DominicRicottone)