In order to induce group sparsity, can we use $\left \| A \right \|_{1,1}$ instead of $\left \| A \right \|_{1,2}$?

36 Views Asked by At

Let's define $\left \| A \right \|_{p,q}$ as follows: $$\left \| A \right \|_{p,q} = \sum_{i=1}^n \left \| \alpha^i \right \|_q^p$$.

Where $\alpha^i$ is the i-th row of the matrix A.

The above norm is used for example in the following paper for inducing group sparsity with p = 1 and q = 2.

[Mairal, Julien, et al. "Non-local sparse models for image restoration." 2009 IEEE 12th international conference on computer vision. IEEE, 2009.]

With p = 1 and q = 2, this function essentially first computes $l_2$ norm on the rows of A. So, the matrix become a vector. Then. it computes the $l_1$ norm of that vector.

My question is what are the consequences of using $\left \| A \right \|_{1,1}$ instead of $\left \| A \right \|_{1,2}$? For example, if we want to solve a problem like this: $$(U,A) = \text{argmin}_{U,A} \left \| Y - UA \right \| + \lambda\left \| A \right \|_{1,2}$$

Does it make the optimization more difficult?

1

There are 1 best solutions below

0
On BEST ANSWER

There shouldn't be any difference from the optimization perspective, as both $L_{1,2}, L_{1,1}$ are convex, sub-differentiable functions. One different might be, if you're using a proximal algorithm, is the prox-map, but I think that both have explicit formulas.

However the question to ask is why use an $L_{1,1}$ regularizer. $L_{1,2}$ encourages group sparsity, that is, all elements in the group are likely to be set to zero or all will be non-zero. $L_{1,1}$ however doesn't have this property. It's as if you impose an $l_1$ norm on a matrix viewed as a long vector, hence it will encourage "individual" sparsity, without considering groups.