I'm trying to solve following convex optimization function: $min_W g(W) + h(W)$, where the $g$ is convex and differentiable and $h$ in convex an non-smooth. $g(W)=||Y-WX||_F^2$ is square loss function.
Note that,$X,W,Y$ all are matrices. $h(W)$ is non-smooth function with known proximal operator (shoft-tresholding(shirnkage(W))). I use fixed step size (t_k=0.02). I tried to do backtracking line search, but I was not sure how to extend it to the case where $Y and W$ are matrices (when they are both vectors, it's pretty easy to do it)
Unfortunately when I use momentum (Nestrove acceleration approach), my algorithm does not converge. Here how I compute momentum:
$W^{(0)}=W^{(-1)} \in R^n$, we repeat:
$v=W^{(W-1)} + \frac{k-2}{k-1}(W^{(k-1)}-W^{(k-1)})$
$W^{(k)}=prox_{tk}(v-t_k\nabla g(v))$
for k=1,2,3,....
Does anybody have some hints regarding computing momentum for accelerared proximal gradient descent method ? second, is it possible to adapt back-tracking line search for the case where $Y and W$ are matrices ?