I am reading about Solution of the Linear Least Squares Problem. Given the function $$f(\theta) = \frac{1}{2} \lVert y - \Phi \theta \rVert _{2}^{2}$$ To find the minimizer we need to compute the gradient of f. The text says that $$\nabla f(\theta^{*}) = 0 \leftrightarrow \Phi^{T}\Phi\theta^{*} - \Phi^{T}y = 0$$
Can someone help me and explain how to compute the gradient of f? I don't understand why $$\nabla f(\theta^{*}) = \Phi^{T}\Phi\theta^{*} - \Phi^{T}y$$
The squared magnitude of the vector $y-\Phi\theta$ is
$$\lVert y - \Phi \theta \rVert^2=(y-\Phi\theta)^T(y-\Phi\theta)=(y^T-\theta^T\Phi^T)(y-\Phi\theta)=y^Ty-y^T\Phi\theta-\theta^T\Phi^Ty+\theta^T\Phi^T\Phi\theta$$
But $y^T\Phi\theta$ is a scalar, so $y^T\Phi\theta=(y^T\Phi\theta)^T=\theta^T\Phi^Ty$. So the righthand side is
$$y^Ty-2\theta^T\Phi^Ty+\theta^T\Phi^T\Phi\theta$$
Differentiating with respect to $\theta$ now gives $-2\Phi^Ty+2\Phi^T\Phi\theta$, and scaling by $1/2$ gives the result.