Relationship between $L^1$ norm and sparsity

225 Views Asked by At

I'm doing some research in the field of sparse representation and sparse modeling. I have two variables and their $L^1$ norm is calculated to make comparisons. As I take it the smaller the value of $L^1$ norm is, the sparser it could be, is that correct? And what's the logic behind it? My two variables are coefficient matrices in the equations of the form $Y=DX$ where $D$ is different, The value of $L^1$ norm of $X_1$ is smaller then what can I conclude? Am I right to conclude $X_2$ is sparser than $X_1$?

1

There are 1 best solutions below

0
On BEST ANSWER

consider the minimization problem $$\min_x ||x||_{L^1} = \sum_i |x_i|$$ $$\text{s.t.} \qquad A x = b$$

then yes, if $A x = b$ and $A^T A$ is orthogonal, the smaller is $\sum_i |x_i|$ the sparser will be the representation of $b$ in the basis $A$.

note that in computer science what we are really interested in is to find the $x$ minimizing $\sum_i |x|^0$ and for which $A x = b$ : it is the prototype example of a lossless compression problem, with for example $A$ being the image wavelet matrix and $b$ being an image to compress. in that case only the non-zero coefficients $x_i$ can be saved allowing for a lossless reconstruction of $b$.

but because in general that problem is very difficult to solve, we often choose to minimize $\sum_i |x|^\delta$ s.t. $Ax=b$ for some $\delta> 0$ which allows us to use the gradient descent algorithm to find a local minima of the minimization problem. and as a last resort, we choose $\delta= 1$ because in that case the objective function is convex which allows us to find easily the global minimum. this is not fully satisfactory even if $\sum_i |x_i|$ is still in some sense a measure of sparsity :

when $A^T A$ is orthogonal then all the $x$ satisfying $A x = b$ will have the same $L^2$ norm (energy) and the more that energy is concentrated in a small number of coefficients the lower will be $\sum_i |x_i|$ .

please also note that when some light loss is acceptable, the problem is often restated as

$$\min_x \sum_i |x_i| + \lambda ||A x -b||_{L^2}^2$$ where $\lambda > 0$ is the regularization coefficient controlling how much sparsity is favored over loss ($Ax$ being not exactly $=b$).