The least squares solution to the problem
$$\min_x \|Ax - b\|_2^2$$
is $x = (A^\top A)^{-1}A^\top b$.
Is there an interpretation of this solution by directly interpretating $x$ as the output of a composition of linear operators?
For example, $A^\top b$ means that $b$ is mapped to the image of $A^\top$, then we apply $(A^\top A)^{-1}$ which means that $A^\top b$ is then mapped to the image of $(A^\top A)^{-1}$.
I am not quite sure how this series of composition implies that $x$ is the optimal solution, can anyone help?
$Ax$ is the orthogonal projection of $b$ onto the column space of $A,$ as may be seen by thinking about what the phrase "least squares" means.
A highly respected and somewhat well known (e.g. some of his research findings were reported in the New York Times) mathematician once looked at the expression $$ H = A(A^\top A)^{-1} A^\top $$ and told me it's the identity matrix, and that is an instance of "pure" mathematicians assuming matrices are square. But of course we're talking about a tall and skinny matrix: $A$ has more rows than columns. And the expression $(A^\top A)^{-1}$ makes no sense unless the columns of $A$ are linearly independent.
It is easy to see that if $b$ is orthogonal to the column space of $A$ then $Hb=0,$ since $Hb = A(A^\top A)^{-1} \Big(A^\top b\Big)$ and $A^\top b=0.$
But if $b$ is in the column space of $A$ then $Hb=b,$ as is seen by saying $b = Ac$ and then simplifying the expression $Hb=HAc.$
Therefore $Hb$ is the orthogonal projection of $b$ onto the column space of $A.$
If we have $$ A(A^\top A)^{-1} A^\top b = Ax, $$ then we can conclude that $x = (A^\top A)^{-1} A^\top b$ provided we can cancel $A$ on the left. The matrix $A,$ having more rows than columns, cannot have a two-sided inverse, but it has a left inverse since its columns are linearly independent. In fact it has infinitely many left inverses, and one of those is $(A^\top A)^{-1} A^\top.$ So we can do the cancellation.