Necessary condition for a sparse optimal point

14 Views Asked by At

In the paper Sparsity Constrained Nonlinear Optimization, the following is being optimized: $$ \min_{x \in \mathbb{R}^n, \|x\|_0\leq s} f(x) $$ where $\|x\|_0$ is the number of nonzero elements.

Theorem 2.1. states the necessary condition for the optimal point $x^*$ which is $\nabla f(x^*)=0$ if $\|x^*\|_0<s$ and $\nabla_{supp(x^*)} f(x^*)=0$ if $\|x^*\|_0=s$ where $\nabla_{supp(x^*)}$ is restriction of the gradient to the elements of $supp(x^*)$.


Question:

Why $\nabla_{supp(x^*)} f(x^*)=0$ if $\|x^*\|_0=s$?


My understanding:

The proof for $\nabla f(x^*)=0$ if $\|x^*\|_0<s$ totally makes sense, but why suddenly gradient over the support should be zero when the number of nonzero elements is equal to $s$.