Is there a direct interpretation of the least squares solution based on composition of linear operators?

40 Views Asked by At

The least squares solution to the problem

$$\min_x \|Ax - b\|_2^2$$

is $x = (A^\top A)^{-1}A^\top b$.

Is there an interpretation of this solution by directly interpretating $x$ as the output of a composition of linear operators?

For example, $A^\top b$ means that $b$ is mapped to the image of $A^\top$, then we apply $(A^\top A)^{-1}$ which means that $A^\top b$ is then mapped to the image of $(A^\top A)^{-1}$.

I am not quite sure how this series of composition implies that $x$ is the optimal solution, can anyone help?

2

There are 2 best solutions below

0
On BEST ANSWER

$Ax$ is the orthogonal projection of $b$ onto the column space of $A,$ as may be seen by thinking about what the phrase "least squares" means.

A highly respected and somewhat well known (e.g. some of his research findings were reported in the New York Times) mathematician once looked at the expression $$ H = A(A^\top A)^{-1} A^\top $$ and told me it's the identity matrix, and that is an instance of "pure" mathematicians assuming matrices are square. But of course we're talking about a tall and skinny matrix: $A$ has more rows than columns. And the expression $(A^\top A)^{-1}$ makes no sense unless the columns of $A$ are linearly independent.

It is easy to see that if $b$ is orthogonal to the column space of $A$ then $Hb=0,$ since $Hb = A(A^\top A)^{-1} \Big(A^\top b\Big)$ and $A^\top b=0.$

But if $b$ is in the column space of $A$ then $Hb=b,$ as is seen by saying $b = Ac$ and then simplifying the expression $Hb=HAc.$

Therefore $Hb$ is the orthogonal projection of $b$ onto the column space of $A.$

If we have $$ A(A^\top A)^{-1} A^\top b = Ax, $$ then we can conclude that $x = (A^\top A)^{-1} A^\top b$ provided we can cancel $A$ on the left. The matrix $A,$ having more rows than columns, cannot have a two-sided inverse, but it has a left inverse since its columns are linearly independent. In fact it has infinitely many left inverses, and one of those is $(A^\top A)^{-1} A^\top.$ So we can do the cancellation.

0
On

I think it is most convenient to consider the orthogonal projection of $\hat{b}$ of $b$ onto $\mathrm{range}(A)$: $$ \hat b = A (A^\top A)^{-1}) A^\top b. $$ This $\hat{b}$ is the vector in $\mathrm{range}(A)$ that is closest to $b$.

The argument goes as follows. If $\hat b$ is the orthogonal projection of $b$ onto $\mathrm{range}(A)$, then $b-\hat{b}$ is orthogonal to every column of $A$, so $a_j^\top (b-\hat{b})= 0$ for all $j$. Alternatively, this means $$ A^\top(b-\hat{b})=0. $$ Since $\hat{b} \in \mathrm{range}(A)$, there is an $x$ with $Ax = \hat{b}$, so $A^\top(b-Ax)=0$. If $A$ has full rank, $A^\top A$ is invertible and $$ x = (A^\top A)^{-1} A^\top b. $$ Then $\hat{b}= A x = A (A^\top A)^{-1} A^\top b $.