$\mathcal{l}_1$ norm replaced with reweighted $\mathcal{l}_2$

59 Views Asked by At

In this paper, (section 2, page 2)$\mathcal{l}_1$ norm is replaced with reweighted $\mathcal{l}_2$ in an optimization problem. I don't understand how $\lVert x\rVert_1$ is replaced with $x^TWx$ and ow the solution has changed to its weighted version.

$$\min_x \lVert x\rVert_1 \quad \text{subject to} \quad Ax=b $$ $$\Downarrow$$ $$x^TWx \quad \text{subject to} \quad Ax=b$$ $$\Downarrow$$ $$x^{k+1}=(W^k)^{-1}A^T(A(W^k)^{-1}A^T)^{-1}b$$

2

There are 2 best solutions below

6
On

It appears that they are taking $W$ to be a diagonal matrix with diagonal entries $W_{ii} = |x_i|^{-1}$ (more precisely, they're taking a sequential approach where the matrix $W$ in the $k$'th iteration has diagonal entries $|x^{k-1}_i|^{-1}$, where $x^{k-1}$ is the result of the $k-1$'th iteration). Assuming everything converges nicely to a solution with all nonzero entries, this should work nicely. I'm not sure what would happen if the optimal solution has some entries of $0$.

EDIT: Going from the step 2 to step 3 is just the standard "normal equations" formula for solving a weighted linear least-squares problem in the case where $A$ has full row rank. Let's try a toy example: $$A = \pmatrix{1 & 8 & 12},\ b = 4 $$ for which it's not hard to see the optimal solution of the $\ell_1$ problem is $$ x = \pmatrix{0\cr 0\cr 1/3\cr}$$ Start with $W = I$. The optimal solution to the least-squares problem

maximize $x^T W x$ subject to $Ax = b$

is

$$x = W^{-1} A^T (AW^{-1} A^T)^{-1} b = \pmatrix{4/209\cr 32/209\cr 48/209} \approx \pmatrix{0.01913875598\cr 0.1531100478\cr 0.2296650718}$$

Now we take $W$ to be the diagonal matrix with diagonal entries $209/4, 209/32, 209/48$, and get

$$ x = W^{-1} A^T (AW^{-1} A^T)^{-1} b =\pmatrix{4/2241 \cr 256/2241 \cr 64/249}\approx \pmatrix{0.001784917448 \cr 0.1142347166\cr 0.2570281124\cr}$$

The next few iterations are approximately

$$ \pmatrix{0.0001610759876\cr 0.08247090565\cr 0.2783393066},\ \pmatrix{0.00001420449501\cr 0.05818161157\cr 0.2945444086},\ \pmatrix{0.000001231478183\cr 0.04035307711\cr 0.3064311793}$$

and it does look plausible that this is converging to the optimal solution of the $\ell_1$ problem.

2
On

The formula for $x^{k+1}$ comes from constraint optimization, in particular, Lagrange multipliers: given a problem of the form

$$ \min f(x)\\ s.t. g(x)=0 $$ introduce the Lagrangian $\mathcal{L}(x,\lambda)=f(x)+\lambda g(x)$. Then, a solution to the minimization problem is a stationary point for the Lagrangian, that is,

$$ \partial_x \mathcal{L}(x,\lambda)=\partial_x f(x)+\lambda \partial_x g(x) =0\\ \partial_\lambda \mathcal{L}(x,\lambda)= g(x)=0. $$

In your case, the Lagrangian is $\mathcal{L}(x,\lambda)=\frac{1}{2}x^TWx+\lambda^T(Ax-b)$ (the factor 1/2 does not change the point where the minimum is achieved, but simplifies calculations later), which leads to the system

$$ W^Tx+A^T\lambda=0\\ Ax=b, $$ Now, the first equation can be solved for $x$, giving $x=-W^{-1}A^T\lambda$, which plugged in the second gives $-AW^{-1}A^T\lambda=b$, or $\lambda=-(AW^{-1}A^T)^{-1}b$. Plug this back in the first equation and you get

$$ W^Tx-A^T(AW^{-1}A^T)^{-1}b=0\Rightarrow x=(W^T)^{-1}A^T(AW^{-1}A^T)^{-1}b $$ which is the formula you see.