I am trying to get a somewhat intuitive understanding of the Least Mean Square estimator and ran into the following problem.
I start with a matrix A, a target point/vector b and a vector x that takes linear combinations of the columns of A.
Then comes the argument that you have your vector Ax closest to b when the vector $ b - Ax $ is orthogonal to the column space of A (if that is the right entity that the orthogonal vector is orthogonal to).
I.e. $$ A^\intercal (b - Ax) = 0 $$
Leading straightforwardly to $$ A^\intercal b = A^\intercal Ax $$
Now, $ A^\intercal A $ is the square symmetric matrix which has across each row the inner product of a column vector of A with all column vectors in A, with the self inner products along the diagonal, so: $$ A^\intercal Ax = \begin{bmatrix} a_1 \cdot a_1 & a_1\cdot a_2 & ...\\ a_2 \cdot a_1 & a_2 \cdot a_2 & ...\\ ... & ... & ... \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ ... \end{bmatrix} $$
Each corresponding row element in $ A^\intercal b $ must be equal to the corresponding element in $ A^\intercal A x $, or: $$ a_1 \cdot a_1 x_1 + a_1 \cdot a_2 x_2+ ... + a_1 \cdot a_n x_n = a_1 \cdot (a_1 x_1 + a_2 x_2+ ... + a_n x_n) = a_1 \cdot b $$ Therefore $$ b = (a_1 x_1 + a_2 x_2+ ... + a_n x_n) $$
Which means that b lies in the column space of A, i.e. it is a linear combination of the columns of A. But I believe LMS works when b does not lie in the column space of A. Was it not allowed to factor out the dot product? What's wrong here?
Thanks.
Your conclusion 'Therefore' simply doesn't follow.
The equations only say that $a_i\cdot b=a_i\cdot (Ax)$ for each column $a_i$ of $A$, and this is not enough to uniquely determine $b$, unless the vectors $a_i$ form a basis.