When trying to find sparse solutions to $Ax=b$, why is $|x|$ the "best convex under-estimator" of the number of non-zero variables?

62 Views Asked by At

So I understand the usual intuition that $L_1$ tends to lead to sparseness, since the difference between $0$ and $0.1$ has the same cost as between $5$ and $5.1$, whereas $L_2$ tends to strongly down-weight smaller differences. But I came across the following passage from Lee's Linear Optimization book, where he says that using absolute value for sparsity is justified since it is the "best convex function under-estimator" of the indicator function penalty.

So... I'm having trouble understanding what this is supposed to mean. Why would $x^2$ not be a good convex function under-estimator of that indicator function on $[-1, 1]$? Curious if I'm missing something.

Any thoughts appreciated.

Thanks.

L1 optimization

1

There are 1 best solutions below

0
On BEST ANSWER

The meaning of "best" here is to minimize the area between the function and the underestimator. Your alternative $x^2$ is indeed a convex estimator but yields a larger area.