I'm trying to study from this pdf: http://math.mit.edu/~gs/linearalgebra/ila0403.pdf
I was just wondering if there were any established proofs for the following formulas:
$\hat{y} = Zw$, where $\hat{y}, w$ are column vectors and $Z$ is a data matrix that is $N * M + 1$
$l(w) = ||Zw - t||^2$, $t$ being a column vector and where $||v||^2 = \sum_{n}{v_{n}^2}$
While searching for answers, I came across this post: Proof of least squares approximation lemma
Would it be fair to say that this proves the second formula? How is the first formula proven?
These two results are actually definitions. Imagine you have a dataset of the form $\{y_i,(x_1,x_2,\cdots,x_M)_i\}_{i=1}^N$. That is, you have $N$ observations $y_i$ of a variable that depends on $M$ independent features ${\bf x}_i = (x_1,x_2\cdots,x_M)_i$. Imagine there exists $M+1$ numbers $(w_0,w_1,w_2,\cdots w_M)$ such that each observation can be explained as the linear combination of the features with these weights, that is
\begin{eqnarray} y_1 &=& w_0 + w_1 x_{1,1} + \cdots w_Mx_{M,1} +\epsilon_1\\ y_2 &=& w_0 + w_1 x_{1,2} + \cdots w_Mx_{M,2} +\epsilon_2\\ &\vdots& \\ y_N &=& w_0 + w_1 x_{1,N} + \cdots w_Mx_{M,N} +\epsilon_N\\ \end{eqnarray}
which can be represented in matrix form as
$$ \left(\begin{array}{c} y_1 \\ \vdots \\ y_N \end{array}\right) = \left(\begin{array}{cccc} 1 & x_{1,1} & \cdots & x_{M,1} \\ 1 & x_{1,2} & \cdots & x_{M,2} \\ & & \vdots & \\ 1 & x_{1,N} & \cdots & x_{M,N} \\ \end{array}\right) \left(\begin{array}{c} w_0 \\ w_1 \\ \vdots \\ w_M \end{array}\right) + \left(\begin{array}{c} \epsilon_1 \\ \vdots \\ \epsilon_N \end{array}\right) $$
or equivalently
$$ {\bf y} = {\bf Z}{\bf w} + {\bf \epsilon} = \hat{\bf y} + {\bf \epsilon} $$
with $\hat{\bf y}={\bf Z}{\bf w}$, with ${\bf W}\in \mathbb{R}^{N\times(M+1)}$, ${\bf y}, {\bf \epsilon}\in \mathbb{R}^N$ and ${\bf w} \in \mathbb{R}^{M+1}$.
As for the second part, the idea is to find find the weights ${\bf w}$ such that the prediction $\hat{\bf y}$ is as close as possible to the real observation ${\bf y}$. That is why you define the distance between these two vectors as
$$ l({\bf w}) = ||{\bf Z}{\bf w} - {\bf y}||^2 $$
and try to minimize it