Size: 2049
Comment:
|
← Revision 28 as of 2025-08-06 00:56:27 ⇥
Size: 2039
Comment: Minor fix
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
'''Ordinary Least Squares''' ('''OLS''') is a linear regression method. It minimizes root mean square errors. | '''Ordinary Least Squares''' ('''OLS''') is a linear regression method, and is effectively synonymous with the '''linear regression model'''. |
Line 11: | Line 11: |
== Univariate == | == Description == |
Line 13: | Line 13: |
The regression line passes through two points: | A linear model is expressed as either {{attachment:model.svg}} (univariate) or {{attachment:mmodel.svg}} (multivariate with ''k'' terms). Either way, a crucial assumption is that the expected value of the error term is 0, such that the [[Statistics/Moments|first moment]] is ''E[y,,i,,|x,,i,,] = α + βx,,i,,''. |
Line 15: | Line 15: |
{{attachment:regression1.svg}} | |
Line 17: | Line 16: |
and | |
Line 19: | Line 17: |
{{attachment:regression2.svg}} | === Univariate === |
Line 21: | Line 19: |
Take the generic equation form of a line: | In the univariate case, the OLS regression is: |
Line 23: | Line 21: |
{{attachment:b01.svg}} | {{attachment:estimate.svg}} |
Line 25: | Line 23: |
Insert the first point into this form. | This formulation leaves the components explicit: the y-intercept term is the mean outcome at ''x=0'', and the slope term is marginal change to the outcome per a unit change in ''x''. |
Line 27: | Line 25: |
{{attachment:b02.svg}} | The derivation can be seen [[Statistics/OrdinaryLeastSquares/Univariate|here]]. |
Line 29: | Line 27: |
This can be trivially rewritten to solve for ''a'' in terms of ''b'': | |
Line 31: | Line 28: |
{{attachment:b03.svg}} | |
Line 33: | Line 29: |
Insert the second point into the original form. | |
Line 35: | Line 30: |
{{attachment:b04.svg}} | === Multivariate === |
Line 37: | Line 32: |
Now additionally insert the solution for ''a'' in terms of ''b''. | In the multivariate case, the regression is fit like: |
Line 39: | Line 34: |
{{attachment:b05.svg}} | {{attachment:mestimate.svg}} |
Line 41: | Line 36: |
Expand all terms to produce: | But conventionally, multivariate OLS is solved using [[LinearAlgebra|linear algebra]] as: |
Line 43: | Line 38: |
{{attachment:b06.svg}} | {{attachment:matrix.svg}} |
Line 45: | Line 40: |
This can now be eliminated into: | Note that using a ''b'' here is [[Statistics/EconometricsNotation#Models|intentional]]. |
Line 47: | Line 42: |
{{attachment:b07.svg}} Giving a solution for ''b'': {{attachment:b08.svg}} This solution is trivially rewritten as: {{attachment:b09.svg}} Expand the formula for correlation as: {{attachment:b10.svg}} This can now be eliminated into: {{attachment:b11.svg}} Finally, ''b'' can be eloquently written as: {{attachment:b12.svg}} Giving a generic formula for the regression line: {{attachment:b13.svg}} |
The derivation can be seen [[Statistics/OrdinaryLeastSquares/Multivariate|here]]. |
Line 77: | Line 48: |
== Linear Model == | == Estimated Coefficients == |
Line 79: | Line 50: |
The linear model can be expressed as: {{attachment:model1.svg}} If these assumptions can be made: |
The '''Gauss-Markov theorem''' demonstrates that (with some assumptions) the OLS estimations are the '''best linear unbiased estimators''' ('''BLUE''') for the regression coefficients. The assumptions are: |
Line 86: | Line 53: |
2. Exogeneity | 2. Exogeneity, i.e. predictors are independent of the outcome and the error term 3. Random sampling 4. No perfect [[LinearAlgebra/Basis|multicolinearity]] 5. Homoskedasticity, i.e. error terms are constant across observations |
Line 88: | Line 58: |
{{attachment:model2.svg}} 3.#3 Random sampling 4. No perfect multicolinearity 5. Heteroskedasticity Then OLS is the best linear unbiased estimator ('''BLUE''') for these coefficients. Using the computation above, the coefficients are estimated to produce: {{attachment:model3.svg}} The variance for each coefficient is estimated as: {{attachment:model4.svg}} Where R^2^ is calculated as: {{attachment:model5.svg}} Note also that the standard deviation of the population's parameter is unknown, so it's estimated like: {{attachment:model6.svg}} |
#5 mostly comes into the estimation of [[Statistics/StandardErrors|standard errors]], and there are alternative estimators that are robust to heteroskedasticity. |
Ordinary Least Squares
Ordinary Least Squares (OLS) is a linear regression method, and is effectively synonymous with the linear regression model.
Description
A linear model is expressed as either (univariate) or
(multivariate with k terms). Either way, a crucial assumption is that the expected value of the error term is 0, such that the first moment is E[yi|xi] = α + βxi.
Univariate
In the univariate case, the OLS regression is:
This formulation leaves the components explicit: the y-intercept term is the mean outcome at x=0, and the slope term is marginal change to the outcome per a unit change in x.
The derivation can be seen here.
Multivariate
In the multivariate case, the regression is fit like:
But conventionally, multivariate OLS is solved using linear algebra as:
Note that using a b here is intentional.
The derivation can be seen here.
Estimated Coefficients
The Gauss-Markov theorem demonstrates that (with some assumptions) the OLS estimations are the best linear unbiased estimators (BLUE) for the regression coefficients. The assumptions are:
- Linearity
- Exogeneity, i.e. predictors are independent of the outcome and the error term
- Random sampling
No perfect multicolinearity
- Homoskedasticity, i.e. error terms are constant across observations
#5 mostly comes into the estimation of standard errors, and there are alternative estimators that are robust to heteroskedasticity.