Thr Proximal Operator for $ {L}_{1} $ Regularized Least squares Problem

239 Views Asked by At

I'm supposed implementing certain optimization algorithms (ISTA, FISTA) to minimize: $$\frac12 ||Ax-(Ax_0+z)||_2^2 + \lambda ||x||_1.$$

$A$ is a matrix, $x$ is a vector, $z$ is some noise filled with random data from a certain distribution. $\lambda$ is to be chosen so as to "yield a sparse solution". Ok.

My notes tell me I need to start at some $x$ and progress by: $$x^{(k+1)} = S_{\alpha \lambda}\left(x^{(k)} - \alpha A^T\left(Ax^{(k)} - \left(Ax_0 + z\right)\right)\right)$$ where $S$ is the proximal operator. I'm using $$S_a(b) = (|b| > a) * (b-a \cdot \operatorname{sign}(b))$$

which I think is the proximal operator for the $l_1$ norm, which is in the original equation. But I'm getting divergent results (ie. $\frac{||x^{(k+1)}-x^{(k)}||_2}{||x^{(k)}||_2}$ is growing ~logarithmicly, instead of converging to zero).

Am I using the wrong proximal operator here? I've tried multiple values of $\alpha$ and $\lambda$, and have looked for bugs in my code, but maybe I missed something in the math.

1

There are 1 best solutions below

0
On

First, as @MichaelGrant commented, you should compute $a$ properly. Here is a MATLAB code snippet:

Hess = @(x) A'*(A*x);
eigsOpt.issym = 1; eigsOpt.tol = 1e-5;
Lf = eigs(Hess, n, 1, 'LM', eigsOpt);
alpha = 0.95/Lf

Alternatively, you need a line search method to compute a different step length at each iteration.

Your soft-thresholding operator is correct (I assume that $*$ stands for the element-wise multiplication). An alternative way to write it is

$$S_{a\lambda}(x) = \mathrm{prox}_{a\lambda\|\cdot\|_1}(x) = \mathrm{sign}(x)*(|x|-a\lambda)_+,$$

but it's exactly the same.

Maybe your problem, however, is very badly conditioned in which case ISTA may converge after many iterations or may, in practice, not converge. Maybe first run the above code to determine the correct step length $\alpha$ and check whether it is too small. Then you can either do some simple preconditioning, or use FISTA which is somewhat less sensitive to conditioning, or try some other algorithm altogether.