Given a function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ where $n$ is large and $f$ is non-convex. The following characterises the sparse minimizer of $f$.
$$ x^* = \arg \min_{x} f + ||x||_0 $$ where $||x||_0$ is the number of nonzero elements of $x$.
This problem is non-differentiable and not easy to solve. I was wondering if there are good references wherein this problem is addressed and algorithms with convergence are proven.
Motivation: The reason I am asking for the help is that in the field of neural networks sparsification, I have found one recent paper introducing an algorithm for solving this problem, refer to this one. Also, we can see that in the neural network sparsification field, L1 or L0 approximation are not used necessarily. For example you can see these two papers Optimal brain damage and Second order derivatives for network pruning
If you have seen them, please share your knowledge with me.
The standard way t0 address this is to use L1 norm regularization (penalty) instead of L0 regularization. The L1 norm regularization term is convex; therefore, it preserves convexity of an objective function which is convex without the L1 norm regularization term.
Although the resulting objective function is non-differentiable, depending on the rest of the objective function and constraints, the optimization problem might be amenable to highly efficient and robust convex conic formulation and solution, for instance using CVX or CVXPY.
In practice, L1 norm regularization seems to work well in inducing sparse solutions. But it is a compromise, in the interest of computational tractability, vs. using L0 norm regularization.
There are many sources of info explaining this if you search for L1 norm regularization