Is it possible in this case to calculate the derivative with matrix notation?

35 Views Asked by At

$$\tag 1 \frac{1}{2}{(y-Xw)}^T(y-Xw)$$ where:

$y$ is $N \times 1$

$X$ is $N \times p$

$\omega$ is $p \times 1$

Is the vectorized way to write

$$\tag 2 \frac{1}{2}\sum_{i=1}^n (y^{(i)}-x^{{(i)}^T}\omega)^2$$

I would like to differentiate $(1)$ with respect to $\omega$ and set the derivative equal to 0. This is easy to do for (2) but I'm wondering if this sort of thing is possible for vectorized notation which I am new to.

I would expect a very similar result to what would happen if you did this procedure with (2) should yield something like:

$$x^T(y-x\omega)=0$$

Is there something I'm missing here or is this hard to do? Is this standard or generally avoided? My attempts seem to lead to unintuitive answers. If this is a bad question, I apologize. Any resources or help would be appreciated, thanks.

1

There are 1 best solutions below

0
On BEST ANSWER

Let $z=(Xw-y)$ and use a colon to denote the trace/Frobenius product, i.e. $A:B={\rm tr}(A^TB)$. Then you can write the function in a form such that finding the differential and gradient is easy. $$\eqalign{ \phi &= \frac{1}{2}z:z \cr d\phi &= z:dz = z:X\,dw = X^Tz:dw \cr \frac{\partial\phi}{\partial w} &= X^Tz = X^T(Xw-y) \cr\cr }$$ Now set the gradient to zero and solve $$\eqalign{ X^TXw &= X^Ty \cr w &= (X^TX)^{-1}X^Ty \cr\cr }$$