Stability of the Solution of $ {L}_{1} $ Regularized Least Squares (LASSO) Against Inclusion of Redundant Elements

71 Views Asked by At

The problem of finding $$ \substack{{\rm min}\\x}\left( \|Ax-b\|^2_2+\lambda \|x\|_1\right),$$ where $\|\cdot\|_2$ and $\|\cdot\|_1$ are the $L_2$ and $L_1$ norms, respectively, is usually called the LASSO. $A$ is a matrix, $x$ and $b$ are vectors.

I believe this problem is robust against introducing some redundancy in the matrix $A$, by introducing for example in $A$ an extra column which is a linear combination of the original columns. Since the solution to LASSO is sparse, I expect this procedure does not change it much, i.e. the solution to $$\substack{{\rm min}\\y}\left( \|\tilde{A}y-b\|^2_2+\lambda \|y\|_1\right),$$ should be very close to the original one.

Can this be proved? Can I find an upper bound to how much the solution changes?

2

There are 2 best solutions below

2
On

It's enlightening to think about what happens if $A$ contains two identical columns, $a_i$ and $a_j$. In that case, the objective function cannot distinguish between a solution with $x_i = 1, x_j = 0$ and a solution with $x_i = \theta, x_j = 1 - \theta$ (where $0 < \theta < 1$). It seems most likely that if either $x_i$ or $x_j$ is active in the solution found by your LASSO solver, then both will be active. In other words, the solution you find will not be as sparse as it could have been. For that reason, it can sometimes be helpful to use a non-convex sparsity-inducing regularizer, such as $\sum_i |x_i|^{1/2}$.

2
On

Take $A=I$, $b = \mathbf{1}$ and $\lambda = 1$. For sake of simplicity, let $A$ be a 3-by-3 matrix and $\bar{A} = [A | \mathbf{1}]$. The solution to

$$\min_{x} (||Ax-b||_2^2 + ||x||_1)$$

can shown to be $x = [0.5; 0.5; 0.5]$. Of course, if you increase $\lambda$ it will tend toward the zero vector. Now, check

$$\min_{y} (||\bar{A}y-b||_2^2 + ||y||_1).$$

Especially compare the cases $y = [0.5; 0.5; 0.5; 0]$ to $y = [0; 0; 0; 1]$. The latter returns a much smaller value than the former ($1$ ve $2.25$). And I wouldn't say, $y(1:3) = [0; 0; 0]$ is close to $x_{\min} = [0.5; 0.5; 0.5]$. You can always crank up $\lambda$ to make them closer, but at that point, you are solving the wrong problem.