Understanding weighted $l_1$ norm

1.4k Views Asked by At

When a $l_1$-norm is added as a penalty function to a quadratic cost function, it is obvious that the weights tend toward zero to produce a sparse solution vector. However, the shrinkage term ($\lambda||weight||_1$) forces the weight uniformly goes to zero. That is the new cost function (Lasso) does not distinguish between the actual zero weights and non-zero ones. Therefore, the optimization model becomes biased for less sparse weights that one wish to estimate.

To overcome with this, People have come up with weighted $l_1$-norm to do the shrinkage selectively rather than uniformly. A sample cost function could be:

$\\ J(k)=||e(k)||^2+\lambda'\sum\limits_{i=1}^{N}log\left(1+\frac{|w_i|}{\zeta'}\right)$

Could someone explain the logarithmic shrinkage term here? Why does this selectively push the weights to go to zero compare to $l_1$-norm? What is the role of constant $\zeta'$ in the shrinkage?

Thank you

1

There are 1 best solutions below

0
On BEST ANSWER

In most cases, the problem that one wants to solve is $$ \underset{x}{\text{minimize}}\ ||y-Ax||_2^2+\lambda||x||_0 $$ However, when using the $0$-"norm", the problem is not convex and we have to solve a combinatorial problem. To avoids this, the $0$-"norm" is often approximated with the $1$-norm, resulting in the Lasso problem you describe. As the $1$-norm is the closest convex approximation to the $0$-"norm", we can solve the Lasso very efficiently using the powerful tools from convex optimization.

Now one could ask if there is a better approximation. To find a "better" approximation (closer to the original $0$-"norm"), it seems that we have to sacrifice the convexity of the problem. However, if we use $\sum_{i=1}^N \log(1+\frac{|x_i|}{\xi'})$ we can actually get a "better" approximation but still utilize convexity. To solve the optimization problem with the $\log$ penalty, we need to implement an iterative approach. Let's start with solving the Lasso, yielding the solution $x^{(0)}$. Next, we linearize the $\log$-penalty around the latest value (using the first Taylor terms) $$ \arg \min_{u_{i}}\ \sum_{i=1}^N\left(\log(1+\frac{u^{(0)}_i}{\xi'})+\frac{1}{\xi'+u_i^{(0)}}(u_i-u_i^{(0)})\right) $$ which yields $$ \arg \min\ \sum_{i=1}^N \frac{|x_i|}{\xi'+|x_i^{(0)}|} $$ (here I used $u_i$ just to avoid the case when $x_i=0$ in $|x_i|$. One has to do some more work but this works as intuition). So now we have a iterative approach: Solve the weighted lasso using the latest solution, update the weights $\left(\frac{1}{\xi'+|x_i^{(k)}|}\right)$, solve again. Here $(\cdot)^{(k)}$ denotes the $k$th iteration. In each iteration, the problem is convex. Here, the constant $\xi'$ is added to avoid numerical problems when $x_i=0$.

An onther intuition can be found by realizing that when $k$ tends to infinity we get $|x_i^{k+1}-x_i^{k}|\rightarrow 0$ which yields $$ \sum_{i=1}^N \frac{|x_i^{k+1}|}{\xi'+|x_i^{(k)}|}\approx ||x||_0 $$ Hope it helps!