Error in linear regression

93 Views Asked by At

In linear regression we have $Ax=b$. Since the equality is an approximate equality, an error vector is used, that is, $Ax+e=b$. We know that using the least square method (to minimize the squared sum of the elements of $e$) the best $x$ is given by: $x=A^+b$ where the plus sign represents the pseudoinverse: $A^+=(A^TA)^{-1}A^T$. Depending on $A$ and $b$, there must be some error which is often nonzero as in linear regression we are doing a non-perfect curve estimation. However, $e=b-Ax$ which is $e=b-AA^+b$ and since $AA^+=I$ always holds, the error is always zero, that is, $e=b(I-AA^+)=b(I-I)=zero$. Why is that? I think the error vector should not be zero regardless of $A$ and $b$. Can one explain this to me. Thank you!

1

There are 1 best solutions below

1
On

No.

In linear regression the matrix $A$ has more rows than columns, and hence (according to the "Definition" section of the wikipedia article) $AA^+$ is not the identity matrix, but rather the projection matrix onto the column space of $A$.

The normal equations folded into the formula $x=A^+b$ forces the fitted error vector $e=b-AA^+b$ to perpendicular to the column space of $A$, so the analysis of variance (or Pythagorean theorem) $\|b\|^2=\|Ax\|^2+\|e\|^2$ holds