Subgradient of the $\ell_0$ "norm"

Question

Subgradient of the $\ell_0$ "norm"

3.2k Views Asked by Bumbble Comm At 10 May 2026 - 8:02

I am trying to characterize the sub-gradient of $\ell_0$ "norm"

$$f(x) := \|x\|_0 := \sum_{i=1}^n 1\{{x_i \neq 0}\}$$

At first, since it satisfies the triangle inequality, I thought that the $\ell_0$ "norm" is convex and non-smooth. Then, I tried to separate the two cases 1) $x=0$ and 2) $x \neq 0$ and set the sub-gradient as the first derivative with respect to $x$ for the latter case, which is $0$ due to constant. However, applying this sub-gradient would lead to a counter-example of definition

$$f(0) < f(x) + 0(0-x), \qquad x \neq 0$$

which confused me a lot. Later, I figured our $\ell_0$ "norm" is a non-convex function since it does not satisfy the property of positive homogeneity. Hence, I am clueless now. Please kindly advise. I am not really experienced in optimization.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 18 Dec 2021 - 4:22

Old post, and still a relevant question. Just to be complete, the analytical form of the generalized subdifferential of the cardinality function is known for several concepts of the generalized subdifferential (Fréchet, limiting, proximal, ...). The gist of it is that in finite dimensions, $$ ∂\text{card}(x) = \{v \mid v_i = 0 \text{ if } x_i \neq 0\}, $$ for any $x$.

Proving it is not an exercise! Details can be found in the paper https://doi.org/10.1007/s11590-012-0456-x.

**Bumbble Comm** · Accepted Answer

I have long suspected that this practice of calling the cardinality function the "$\ell_0$ norm" would cause problems, and this post is evidence of that.

The so-called "$\ell_0$ norm" is not a norm, and it is not convex.

If Wikipedia is to be believed, the term "$\ell_0$ norm" was coined by David Donoho, in his work on using the $\ell_1$ norm (a true norm, and therefore convex) as a proxy for cardinality in convex statistical regressions. It was placed in quotes to make it clear that it is an abuse of notation. Alas, people have adopted this term without the quotes and quite often without any indication that it is incorrect terminology.

I frankly think that the usage of the term and the equally faulty "$\|\cdot\|_0$" notation needs to be resisted everywhere, including peer review and less formal forums like this. I believe we should insist that writers use terms like "cardinality function" and $\mathop{\mathrm{card}}(\cdot)$ instead.

To return to your specific question: because the cardinality function is not a norm, and because it is not convex, convex optimization methods simply cannot be used with it. Recall that the subdifferential $\partial f(x)$ of a function $f:\mathbb{R}^n\rightarrow\mathbb{R}$ at a point $x\in\mathbb{R}^n$ is the set of subgradients $$\partial f(x) \triangleq \{ v\in\mathbb{R}^n~|~f(x) + \langle v, y - x \rangle \leq f(y) ~ \forall y\in\mathbb{R}^n \}$$ Unlike a convex function---which has at least one subgradient everywhere on its domain---a nonconvex function often has no subgradient, even if one were to employ an alternate definition that allows for a "local" subgradient.

EDIT: So just what is the subdifferential of the cardinality function? That is, what is $$\partial \mathop{\mathrm{card}}(x) \triangleq \{ v\in\mathbb{R}^n~|~\mathop{\mathrm{card}}(x) + \langle v, y - x \rangle \leq \mathop{\mathrm{card}}(y) ~ \forall y\in\mathbb{R}^n \}$$It has a subgradient at exactly one point: the origin, where $\partial \mathop{\mathrm{card}}(0) = \{0\}.$ Everywhere else, $\partial \mathop{\mathrm{card}}(x) = \emptyset.$ It's not difficult to see why:

Since $\mathop{\mathrm{card}}(x)\leq n$, no non-zero vector $v$ can be a subgradient at any $x\in\mathbb{R}^n$, including the origin, because the secant inequality will fail: that is, $\mathop{\mathrm{card}}(x) + \langle v, y - x \rangle \not \leq n$ for at least some $y$, e.g. $y=x+(n+1)\|v\|_2^{-1}v.$
If $v=0$, then the secant inequality then becomes $\mathop{\mathrm{card}}(x) \leq \mathop{\mathrm{card}}(y)~\forall y\in\mathbb{R}^n$, which only holds at the origin.

Subgradient of the $\ell_0$ "norm"

There are 2 best solutions below

Related Questions in OPTIMIZATION

Related Questions in SUBGRADIENT

Related Questions in NON-SMOOTH-ANALYSIS

Trending Questions

Popular # Hahtags

Popular Questions