$$\tag 1 \frac{1}{2}{(y-Xw)}^T(y-Xw)$$ where:
$y$ is $N \times 1$
$X$ is $N \times p$
$\omega$ is $p \times 1$
Is the vectorized way to write
$$\tag 2 \frac{1}{2}\sum_{i=1}^n (y^{(i)}-x^{{(i)}^T}\omega)^2$$
I would like to differentiate $(1)$ with respect to $\omega$ and set the derivative equal to 0. This is easy to do for (2) but I'm wondering if this sort of thing is possible for vectorized notation which I am new to.
I would expect a very similar result to what would happen if you did this procedure with (2) should yield something like:
$$x^T(y-x\omega)=0$$
Is there something I'm missing here or is this hard to do? Is this standard or generally avoided? My attempts seem to lead to unintuitive answers. If this is a bad question, I apologize. Any resources or help would be appreciated, thanks.
Let $z=(Xw-y)$ and use a colon to denote the trace/Frobenius product, i.e. $A:B={\rm tr}(A^TB)$. Then you can write the function in a form such that finding the differential and gradient is easy. $$\eqalign{ \phi &= \frac{1}{2}z:z \cr d\phi &= z:dz = z:X\,dw = X^Tz:dw \cr \frac{\partial\phi}{\partial w} &= X^Tz = X^T(Xw-y) \cr\cr }$$ Now set the gradient to zero and solve $$\eqalign{ X^TXw &= X^Ty \cr w &= (X^TX)^{-1}X^Ty \cr\cr }$$