Differences between revisions 4 and 9 (spanning 5 versions)
Revision 4 as of 2023-10-28 05:00:22
Size: 1918
Comment: Regression
Revision 9 as of 2024-06-07 14:58:51
Size: 2153
Comment: Rewrite 1
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
== Data == == Observations and Measurements ==
Line 9: Line 9:
The outcome variable is ''y''. For observation ''i'', the outcome value is ''y,,i,,''. The outcome variable is ''y''. The outcome measurement for observation ''i'' is ''y,,i,,''.
Line 11: Line 11:
The treatment variable is ''x,,1,,''. For observation ''i'', the treatment value is ''x,,1i,,''. If there is a single predictor, it may be specified as ''x''; the measurement is ''x,,i,,''. More commonly, there is a set of predictors specified like ''x,,1,,'', ''x,,2,,'', and so on. The measurements are then ''x,,1i,,'', ''x,,2i,,'', and so on.
Line 13: Line 13:
The control variables are ''x,,2,,'' through ''x,,k,,'' (up to ''k'' - 1 control variables). For observation ''i'', a control value might be ''x,,2i,,''. When expressing data with [[LinearAlgebra|linear algebra]], the outcome measurements are composed into vector ''y'' with size ''n'', and the predictor measurements are composed into matrix '''''X''''' of shape ''n'' by ''p''.

A very common exception: income is usually represented by ''Y'' or ''y''. In relevant literature, expect to see different letters.



== Error Terms ==

Error terms are variably represented by ''ε'', ''e'', ''u'', or ''v''. The error term for observation ''i'' would be represented like ''ε,,i,,''.



== Distributions ==

The [[Statistics/NormalDistribution|normal distribution]] is frequently expressed in econometrics. The typical notation is ''x,,i,, ~ N(μ, σ)''.

For multiple variables, pieces of [[LinearAlgebra|linear algebra]] notation are introduced. For example, the joint statement of [[Econometrics/Exogeneity|exogeneity]] and [[Econometrics/Homoskedasticity|homoskedasticity]] is:

{{attachment:exo.svg}}
Line 39: Line 57:
Based on [[Econometrics/OrdinaryLeastSquares|OLS regression]], the estimated outcome for observation ''i'' is:
Line 40: Line 59:
{{attachment:estimate.svg}}
Line 41: Line 61:
== Regression == No matter the regression method, the residual is:
Line 43: Line 63:
A regression line passes through two points: {{attachment:residual.svg}}
Line 45: Line 65:
{{attachment:regression1.svg}} And the coefficient of determination, a.k.a. the ''R^2^'', is:
Line 47: Line 67:
and

{{attachment:regression2.svg}}

Take the generic equation form of a line:

{{attachment:b01.svg}}

Insert the first point into this form.

{{attachment:b02.svg}}

This can be trivially rewritten to solve for ''a'' in terms of ''b'':

{{attachment:b03.svg}}

Insert the second point into the original form.

{{attachment:b04.svg}}

Now additionally insert the solution for ''a'' in terms of ''b''.

{{attachment:b05.svg}}

Expand all terms to produce:

{{attachment:b06.svg}}

This can now be eliminated into:

{{attachment:b07.svg}}

Giving a solution for ''b'':

{{attachment:b08.svg}}

This solution is trivially rewritten as:

{{attachment:b09.svg}}

Expand the formula for correlation as:

{{attachment:b10.svg}}

This can now be eliminated into:

{{attachment:b11.svg}}

Finally, ''b'' can be eloquently written as:

{{attachment:b12.svg}}

Giving a generic formula for the regression line:

{{attachment:b13.svg}}
{{attachment:rsquared.svg}}

Econometrics Notation

Observations and Measurements

The number of observations is n.

The outcome variable is y. The outcome measurement for observation i is yi.

If there is a single predictor, it may be specified as x; the measurement is xi. More commonly, there is a set of predictors specified like x1, x2, and so on. The measurements are then x1i, x2i, and so on.

When expressing data with linear algebra, the outcome measurements are composed into vector y with size n, and the predictor measurements are composed into matrix X of shape n by p.

A very common exception: income is usually represented by Y or y. In relevant literature, expect to see different letters.

Error Terms

Error terms are variably represented by ε, e, u, or v. The error term for observation i would be represented like εi.

Distributions

The normal distribution is frequently expressed in econometrics. The typical notation is xi ~ N(μ, σ).

For multiple variables, pieces of linear algebra notation are introduced. For example, the joint statement of exogeneity and homoskedasticity is:

exo.svg

Statistics

The average outcome is:

average.svg

The variance is:

variance.svg

The standard deviation is:

sd.svg

The covariance between the treatment and outcome is:

covariance.svg

The correlation between the treatment and outcome is:

correlation.svg

Based on OLS regression, the estimated outcome for observation i is:

estimate.svg

No matter the regression method, the residual is:

residual.svg

And the coefficient of determination, a.k.a. the R2, is:

rsquared.svg


CategoryRicottone

Statistics/EconometricsNotation (last edited 2025-01-10 14:15:50 by DominicRicottone)