Subgradient of the $\ell_0$ "norm"

3.2k Views Asked by At

I am trying to characterize the sub-gradient of $\ell_0$ "norm"

$$f(x) := \|x\|_0 := \sum_{i=1}^n 1\{{x_i \neq 0}\}$$

At first, since it satisfies the triangle inequality, I thought that the $\ell_0$ "norm" is convex and non-smooth. Then, I tried to separate the two cases 1) $x=0$ and 2) $x \neq 0$ and set the sub-gradient as the first derivative with respect to $x$ for the latter case, which is $0$ due to constant. However, applying this sub-gradient would lead to a counter-example of definition

$$f(0) < f(x) + 0(0-x), \qquad x \neq 0$$

which confused me a lot. Later, I figured our $\ell_0$ "norm" is a non-convex function since it does not satisfy the property of positive homogeneity. Hence, I am clueless now. Please kindly advise. I am not really experienced in optimization.

2

There are 2 best solutions below

2
On BEST ANSWER

I have long suspected that this practice of calling the cardinality function the "$\ell_0$ norm" would cause problems, and this post is evidence of that.

The so-called "$\ell_0$ norm" is not a norm, and it is not convex.

If Wikipedia is to be believed, the term "$\ell_0$ norm" was coined by David Donoho, in his work on using the $\ell_1$ norm (a true norm, and therefore convex) as a proxy for cardinality in convex statistical regressions. It was placed in quotes to make it clear that it is an abuse of notation. Alas, people have adopted this term without the quotes and quite often without any indication that it is incorrect terminology.

I frankly think that the usage of the term and the equally faulty "$\|\cdot\|_0$" notation needs to be resisted everywhere, including peer review and less formal forums like this. I believe we should insist that writers use terms like "cardinality function" and $\mathop{\mathrm{card}}(\cdot)$ instead.

To return to your specific question: because the cardinality function is not a norm, and because it is not convex, convex optimization methods simply cannot be used with it. Recall that the subdifferential $\partial f(x)$ of a function $f:\mathbb{R}^n\rightarrow\mathbb{R}$ at a point $x\in\mathbb{R}^n$ is the set of subgradients $$\partial f(x) \triangleq \{ v\in\mathbb{R}^n~|~f(x) + \langle v, y - x \rangle \leq f(y) ~ \forall y\in\mathbb{R}^n \}$$ Unlike a convex function---which has at least one subgradient everywhere on its domain---a nonconvex function often has no subgradient, even if one were to employ an alternate definition that allows for a "local" subgradient.

EDIT: So just what is the subdifferential of the cardinality function? That is, what is $$\partial \mathop{\mathrm{card}}(x) \triangleq \{ v\in\mathbb{R}^n~|~\mathop{\mathrm{card}}(x) + \langle v, y - x \rangle \leq \mathop{\mathrm{card}}(y) ~ \forall y\in\mathbb{R}^n \}$$It has a subgradient at exactly one point: the origin, where $\partial \mathop{\mathrm{card}}(0) = \{0\}.$ Everywhere else, $\partial \mathop{\mathrm{card}}(x) = \emptyset.$ It's not difficult to see why:

  • Since $\mathop{\mathrm{card}}(x)\leq n$, no non-zero vector $v$ can be a subgradient at any $x\in\mathbb{R}^n$, including the origin, because the secant inequality will fail: that is, $\mathop{\mathrm{card}}(x) + \langle v, y - x \rangle \not \leq n$ for at least some $y$, e.g. $y=x+(n+1)\|v\|_2^{-1}v.$
  • If $v=0$, then the secant inequality then becomes $\mathop{\mathrm{card}}(x) \leq \mathop{\mathrm{card}}(y)~\forall y\in\mathbb{R}^n$, which only holds at the origin.
0
On

Old post, and still a relevant question. Just to be complete, the analytical form of the generalized subdifferential of the cardinality function is known for several concepts of the generalized subdifferential (Fréchet, limiting, proximal, ...). The gist of it is that in finite dimensions, $$ ∂\text{card}(x) = \{v \mid v_i = 0 \text{ if } x_i \neq 0\}, $$ for any $x$.

Proving it is not an exercise! Details can be found in the paper https://doi.org/10.1007/s11590-012-0456-x.