Projections
When a vector does not exist in s column space, the best approximation of it in that columns space is called a projection.
Contents
Vectors
Given two vectors a and b, we can project b onto a to get the best possible estimate of the former as a multiple of the latter. This projection p has an error term e.
The factor which converts a into an estimate is notated as x̂, so that p = ax̂. The error term can be characterized by e = b - p or e = b - ax̂.
a is orthogonal to e. Therefore, aT(b - ax̂) = 0. This simplifies to x̂ = (aTb)/(aTa). Altogether, the projection is characterized as p = a(aTb)/(aTa).
A matrix P can be defined such that p = Pb. The projection matrix is (aaT)/(aTa). The column space of P (a.k.a. C(P)) is the line through a, and its rank is 1.
Incidentally, P is symmetric (i.e. PT = P) and idempotent (i.e. P2 = P).
Matrices
For systems of equations like Ax = b where there is no solution for x, as in b does not exist in the column space of A, we can instead solve Ax̂ = p where p estimates b with an error term e.
The error term can be characterized as e = b - p or e = b - Ax̂
e is orthogonal to the row space of A because the error term does not exist in any linear combination of the rows. The projection is more easily worked with in terms of AT, so instead think of e being orthogonal to the column space of AT, a.k.a. e is the null space of AT. Therefore, AT(b - Ax̂) = 0.
Altogether, the system of normal equations for this problem is ATAx̂ = ATb. This simplifies to x̂ = (ATA)-1ATb. Altogether, the projection is characterized as p = A(ATA)-1ATb.
A matrix P can be defined such that p = Pb. The projection matrix is A(ATA)-1AT.
b can also be projected onto e, which geometrically means projecting into the null space of AT. Algebraically, if one projection matrix has been computed as P, then the projection matrix for going the other way is (I - P)b.
As above, P is symmetric (i.e. PT = P) and idempotent (i.e. P2 = P).
This should look familiar. A projection is inherently the minimization of the error term.
Some notes:
If A were a square matrix, most of the above equations would cancel out. But we cannot make that assumption.
If b were in the column space of A, then P would be the identity matrix.
If b were orthogonal to the column space of A, then necessarily b is in the null space of AT. For that reason, projecting b onto e would give an identity matrix. In that case, Pb = 0 and b = e.