Convex optimization with $\ell_0$ "norm"

1.3k Views Asked by At

I have an optimization problem of the form

$$\begin{align*}\text{minimize }\;&f(x)\\ \text{subject to }\;&||x||_0 \le t,\end{align*}$$

where $t$ is a given constant and $f:\mathbb{R}^d \to \mathbb{R}$ is a convex, differential function and $x$ ranges over $\mathbb{R}^d$. Or, roughly equivalently, I'd like to solve

$$\text{minimize }\; f(x) + \lambda \cdot ||x||_0.$$

What techniques are available for this?


Techniques I've found in my reading so far:

  • $L_1$ norm optimization: Replace the $L_0$ norm with a $L_1$ norm, and then solve the resulting problem. In other words, solve

    $$\text{minimize }\; f(x) + \lambda \cdot ||x||_1.$$

    Optionally, follow this with iterative re-weighting: given a candidate solution $\hat{x}$, solve

    $$\text{minimize }\; f(x) + \lambda \sum_{i=1}^d \frac{1}{|\hat{x}_i|} |x_i|,$$

    and then replace $\hat{x}$ with the resulting solution; iterate until convergence. Apparently, the $L_1$ norm encourages sparse solutions. (I guess this is the idea behind Lasso and compressed sensing?)

  • Approximate the $L_0$ norm: Replace the $L_0$ norm with $L_{1/2}$ norm, or with $L_p$ norm for $p \in (0,1)$, or with a function $g(x)$ defined by an approximation such as

    $$g(x) = \sum_{i=1}^d \log(1 + |x_i|/\alpha).$$

Also, two other techniques that I haven't seen described anywhere, but seem like natural algorithms:

  • Forward greedy selection. Start by finding a one-element set $S_1$ that makes the following as small as possible:

    $$\begin{align*}\text{minimize }\;&f(x)\\ \text{subject to }\;&x_j = 0 \text{ for all } j \notin S_1;\end{align*}$$

    Then find a two-element $S_2$ such that $S_1 \subset S_2$ and that makes the following as small as possible:

    $$\begin{align*}\text{minimize }\;&f(x)\\ \text{subject to }\;&x_j = 0 \text{ for all } j \notin S_2;\end{align*}$$

    Iterate, adding one coefficient to the set at each stage. This minimization step can be solved efficiently by trying all possibilities for the one index you add to $S$ in each step.

  • Backward greedy selection. Similar to above, but you start with $S_0=\{1,2,\dots,d\}$. In the $i$th iteration you look for $S_i$ such that $S_i \subset S_{i-1}$ and $|S_i|=|S_{i-1}|-1$ and that makes the following as small as possible:

    $$\begin{align*}\text{minimize }\;&f(x)\\ \text{subject to }\;&x_j = 0 \text{ for all } j \notin S_i;\end{align*}$$

Are there any other techniques worth knowing about? Are there any "dominance" relationships between these (e.g., method X usually beats method Y, so don't bother with method Y)?


I know that the L0 norm $||\cdot||_0$ isn't convex, and in fact, isn't even a norm. I know the optimization problem I'm trying to solve is NP-hard in the worst case, so we can't expect efficient solutions that always produce the optimal answer, but I'm interested in pragmatic heuristics that will often work well when $f(x)$ is nice (smooth, etc.).

1

There are 1 best solutions below

1
On

You should have a look at a nice package called Smoothed $ {L}_{0} $ (Smoothed L0 / Smoothed L Zero).

They approximate the $ {L}_{0} $ "Pseudo Norm" (Which isn't convex) by a Gaussian Kernel.
The idea is iterating on the parameter defining the kernel (Warm Start).

It seems to work nicely and fit your problem.