Requirements for Proximal Gradient Descent Algorithm to Converge When Using Momentum (Accelerated Descent)

423 Views Asked by At

I'm trying to solve following convex optimization function: $min_W g(W) + h(W)$, where the $g$ is convex and differentiable and $h$ in convex an non-smooth. $g(W)=||Y-WX||_F^2$ is square loss function.

Note that,$X,W,Y$ all are matrices. $h(W)$ is non-smooth function with known proximal operator (shoft-tresholding(shirnkage(W))). I use fixed step size (t_k=0.02). I tried to do backtracking line search, but I was not sure how to extend it to the case where $Y and W$ are matrices (when they are both vectors, it's pretty easy to do it)

Unfortunately when I use momentum (Nestrove acceleration approach), my algorithm does not converge. Here how I compute momentum:

$W^{(0)}=W^{(-1)} \in R^n$, we repeat:

$v=W^{(W-1)} + \frac{k-2}{k-1}(W^{(k-1)}-W^{(k-1)})$

$W^{(k)}=prox_{tk}(v-t_k\nabla g(v))$

for k=1,2,3,....

Does anybody have some hints regarding computing momentum for accelerared proximal gradient descent method ? second, is it possible to adapt back-tracking line search for the case where $Y and W$ are matrices ?