Derivative of an $l_2$ norm with respect to a matrix

323 Views Asked by At

I have been working on a problem for a review homework for a class on Machine Learning. The problem statement is: minimize $\sum_{j=0}^n \|W^T x_j - y_j\|_{2}^{2} + \lambda \|W\|_F^2$ with respect to $W$.

I have made some progress with setting the gradient to zero and solving for a closed form solution, I have found that $\nabla_{w} \lambda \|W\|_F^2 = 2\lambda W$. And I have worked out that: $\sum_{j=0}^n \|W^T x_j - y_j\|_2^2 = \sum_{j=0}^n (W^T x_j - y_j)^T (W^T x_j - y_j) = \sum_{j=0}^n (x_j^T W - y_j^T)(W^T x_j - y_j) = \sum_{j=0}^n (x_j^T WW^T x_j - x_j^T Wy_j - y_j^T W^T x_j + y_j^T y_j)$

But I can't seem to make any progress on that last term, obviously the last term vanishes but beyond that I'm stuck. Could anyone point me in the right direction, thanks!

1

There are 1 best solutions below

0
On

Instead of indexed vectors, write the function in terms of the matrix $X$ whose columns are the $x_k$ vectors. Likewise, the vectors $y_k$ are the columns of the matrix $Y$.

For further convenience, define a new matrix variable $$\eqalign{ Z &= X^TW-Y^T \cr }$$ Write the function in terms of the Frobenius (:) Inner Product and these new variables. Then it is a snap to find the differential and gradient. $$\eqalign{ f &= Z:Z + \lambda W:W \cr\cr df &= 2Z:dZ + 2\lambda W:dW \cr &= 2Z:X^TdW + 2\lambda W:dW \cr &= 2(XZ + \lambda W):dW \cr &= 2(XX^TW - XY^T + \lambda W):dW \cr\cr \frac{\partial f}{\partial w} &= 2(XX^TW - XY^T + \lambda W) \cr\cr }$$ Setting the gradient to zero yields $$\eqalign{ (XX^T + \lambda I)\,W &= XY^T \cr W &= (XX^T + \lambda I)^{-1}XY^T \cr }$$