I'm trying to implement the subgradient method and proximal gradient method with constant stepsize for the lasso problem but the result for the subgradient method and proximal gradient is almost identical and makes 2 lines look like 1 in the convergence trajectory picture.
Looking at the details at every iteration I realized most of the change from both 2 methods comes from the gradient descent step. It means that the prox operator step is insignificant. Am I right? How can I have a lasso problem that outputs a big difference between the subgradient method and the proximal gradient that looks like some pictures I see quite popular when searching about the two methods?
I tried to increase the regularization parameter but it doesn't work, the subgradient still does it as well as the proximal gradient. And by the way, the FISTA also produces a descent convergence trajectory which is not what it wants.
So the questions:
- is my inference right,
- how to have the problem that creates a big difference between the subgradient and proximal gradient and the 'volatile' convergence trajectory in FISTA. example of convergence trajectory