I'm trying to show the following. $Pa$ is the approximation system of $y$. I want to show that finding the minimmum for the function $$f(a,y)=||Pa-y||_2^2$$ is equivalent to solve the normal system of the least square method $$P^TPa=P^Ty$$
What I thought so far: I can write $$P^TPa=P^Ty\implies P^T(Pa-y)=0$$ and $$\frac{\partial f}{\partial a}=0$$ $$(\sqrt{(Pa-y)^2}^2)'=0 \implies 2(Pa-y)P=0$$
Now I don't see why $$2(Pa-y)P=P^T(Pa-y)$$
$P$ is a matrix and $a$ and $y$ are column vectors, so $(Pa-y)P$ doesn't really make sense because the sizes don't match up. $$ \begin{align} \frac{\mathrm{d}}{\mathrm{d}a}\|Pa-y\|^2 &=\frac{\mathrm{d}}{\mathrm{d}a}\langle Pa-y,Pa-y\rangle\\ &=\langle Pa-y,P\rangle+\langle P,Pa-y\rangle\\ &=2\langle Pa-y,P\rangle\\ &=2P^T(Pa-y) \end{align} $$ not $(Pa-y)P$.
Since $\frac{\mathrm{d}}{\mathrm{d}a}\sqrt{f(a)}=\frac{f'(a)}{2\sqrt{f(a)}}$, we get $$ \frac{\mathrm{d}}{\mathrm{d}a}\|Pa-y\|=\frac{P^T(Pa-y)}{\|Pa-y\|} $$