Differences between revisions 1 and 17 (spanning 16 versions)
Revision 1 as of 2023-10-28 05:18:15
Size: 1390
Comment:
Revision 17 as of 2024-06-05 14:58:24
Size: 1809
Comment: Simplify language
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Linear Regression = = Ordinary Least Squares =
Line 3: Line 3:
A linear regression expresses the linear relation of a treatment variable to an outcome variable. '''Ordinary Least Squares''' ('''OLS''') is a linear regression method. It minimizes root mean square errors.
Line 11: Line 11:
== Regression Line ==

A regression line can be especially useful on a scatter plot.
== Univariate ==
Line 23: Line 21:
It can be [[Econometrics/OrdinaryLeastSquares/UnivariateProof|proven]] that the slope of the regression line is equal to:

{{attachment:b12.svg}}

The generic formula for the regression line is:

{{attachment:b13.svg}}
Line 27: Line 33:
== Regression Computation == == Multivariate ==
Line 29: Line 35:
Take the generic equation form of a line: ----
Line 31: Line 37:
{{attachment:b01.svg}}
Line 33: Line 38:
Insert the first point into this form.
Line 35: Line 39:
{{attachment:b02.svg}} == Linear Model ==
Line 37: Line 41:
This can be trivially rewritten to solve for ''a'' in terms of ''b'': The linear model can be expressed as:
Line 39: Line 43:
{{attachment:b03.svg}} {{attachment:model1.svg}}
Line 41: Line 45:
Insert the second point into the original form. If these assumptions can be made:
Line 43: Line 47:
{{attachment:b04.svg}}  1. Linearity
 2. [[Econometrics/Exogeneity|Exogeneity]]
 3. Random sampling
 4. No perfect multicolinearity
 5. [[Econometrics/Homoskedasticity|Homoskedasticity]]
Line 45: Line 53:
Now additionally insert the solution for ''a'' in terms of ''b''. Then OLS is the best linear unbiased estimator ('''BLUE''') for these coefficients.
Line 47: Line 55:
{{attachment:b05.svg}} Using the computation above, the coefficients are estimated to produce:
Line 49: Line 57:
Expand all terms to produce: {{attachment:model2.svg}}
Line 51: Line 59:
{{attachment:b06.svg}} The variances for each coefficient are:
Line 53: Line 61:
This can now be eliminated into: {{attachment:homo1.svg}}
Line 55: Line 63:
{{attachment:b07.svg}} Note that the standard deviation of the population's parameter is unknown, so it's estimated like:
Line 57: Line 65:
Giving a solution for ''b'': {{attachment:homo2.svg}}
Line 59: Line 67:
{{attachment:b08.svg}} If the homoskedasticity assumption does not hold, then the estimators for each coefficient are actually:
Line 61: Line 69:
This solution is trivially rewritten as: {{attachment:hetero1.svg}}
Line 63: Line 71:
{{attachment:b09.svg}} Wherein, for example, ''r,,1j,,'' is the residual from regressing ''x,,1,,'' onto ''x,,2,,'', ... ''x,,k,,''.
Line 65: Line 73:
Expand the formula for correlation as: The variances for each coefficient can be estimated with the Eicker-White formula:
Line 67: Line 75:
{{attachment:b10.svg}} {{attachment:hetero2.svg}}
Line 69: Line 77:
This can now be eliminated into:

{{attachment:b11.svg}}

Finally, ''b'' can be eloquently written as:

{{attachment:b12.svg}}

Giving a generic formula for the regression line:

{{attachment:b13.svg}}
See [[https://www.youtube.com/@kuminoff|Nicolai Kuminoff's]] video lectures for the derivation of the robust estimators.

Ordinary Least Squares

Ordinary Least Squares (OLS) is a linear regression method. It minimizes root mean square errors.


Univariate

The regression line passes through two points:

[ATTACH]

and

[ATTACH]

It can be proven that the slope of the regression line is equal to:

[ATTACH]

The generic formula for the regression line is:

[ATTACH]


Multivariate


Linear Model

The linear model can be expressed as:

model1.svg

If these assumptions can be made:

  1. Linearity
  2. Exogeneity

  3. Random sampling
  4. No perfect multicolinearity
  5. Homoskedasticity

Then OLS is the best linear unbiased estimator (BLUE) for these coefficients.

Using the computation above, the coefficients are estimated to produce:

model2.svg

The variances for each coefficient are:

homo1.svg

Note that the standard deviation of the population's parameter is unknown, so it's estimated like:

homo2.svg

If the homoskedasticity assumption does not hold, then the estimators for each coefficient are actually:

hetero1.svg

Wherein, for example, r1j is the residual from regressing x1 onto x2, ... xk.

The variances for each coefficient can be estimated with the Eicker-White formula:

hetero2.svg

See Nicolai Kuminoff's video lectures for the derivation of the robust estimators.


CategoryRicottone

Statistics/OrdinaryLeastSquares (last edited 2025-01-10 14:33:38 by DominicRicottone)