I am aware that $ \ell_{2,1} $-norm is being used for inducing a structure in the sparse matrix. I was told by someone that $ \ell_{\infty} $ and $ \ell_{2,1} $-norms give similar results. But how and why is that?
What are the differences, similarities, or advantages of $ \ell_{\infty} $-norm versus $ \ell_{2,1} $-norm in the case of norm minimization algorithms?
More specifically this question is in the context of Robust PCA (low-rank and sparse decomposition of a two dimensional matrix) where the mentioned norms are used to induce structured sparsity in the sparse part.
Robust PCA solves the matrix decomposition problem
$$L^*, S^* = \arg \min_{L, S} \|L\|_* + \|S\|_1 \quad \text{s.t. } L + S = X$$
as a surrogate for the actual problem
$$L^*, S^* = \arg \min_{L, S} rank(L) + \|S\|_0 \quad \text{s.t. } L + S = X,$$
To induce structured-sparsity in $S$ we can use the $\ell_{2,1}$ or $\ell_{\infty}$-norm. So the problem becomes:
$$L^*, S^* = \arg \min_{L, S} \|L\|_* + \|S\|_{2,1} \quad \text{s.t. } L + S = X$$
or
$$L^*, S^* = \arg \min_{L, S} \|L\|_* + \|S\|_{\infty} \quad \text{s.t. } L + S = X$$