This is an excerpt from Gilbert Strang's Linear algebra. Suppose I have a collection of points $(t,b) \in R^2$, $\{(1,1), (2,2), (3,2)\}$. I want to find a least square that gives me the least square.
Of course, I may set up a system of equation: \begin{align*} C + D &= 1 \\ C + 2D &= 2 \\ C + 3D &= 2, \end{align*} which is equivalent to:
$\begin{pmatrix} 1 & 1 \\ 1 & 2 \\ 1 & 3 \end{pmatrix} \begin{pmatrix}C \\ D\end{pmatrix} = \begin{pmatrix}1 \\ 2 \\ 2\end{pmatrix}$,
and find $C, D$ that is an orthogonal projection of $(1,2,2)$ onto span of two vectors. Let $U:= span\{(1,1,1), (1,2,3)\}$.
I am confused why this such projection gives minimization of the least square line. Our original problem is relevant to projecting 2-dimensional vectors to a 1-dimensional subspace, and our question is to find such 1-dimensional subspace. Now, we have another problem that involves a projection of a vector $v \in R^3$ to a 2-dimensional subspace $U$. What's the connection between least square line and projection? Why is it an equivalent problem?
Furthermore, if there are $n$ points of the form $(t, b)$, then I may set up a system of linear equation $Tx = y$, where $T \in L(R^2, R^n)$. I can define adjoint $T' \in L(R^n, R^2)$ characterized by: $\langle T\vec x, \vec y\rangle_n = \langle \vec x, T'\vec y\rangle_2$. Strang uses transpose matrix to define a orthogonal matrix, and I am trying to reinterpret everything he does in terms of inner product (for example, what he defines as $x^TA^T \vec b$ is just an inner product $\langle b, Ax\rangle$). After all, transpose matrix is just a matrix representation of adjoint with respect to some orthornormal basis.