i have the following optimization: $$\min_x f(x) \\ \mathrm{s.t}: x\ge0 , ~ \|x\|_0=1$$ and $x ~\mathrm{is}~ n \times 1$, and $f(x) \ge 0$ as it is $\|.\|^2$.
Meaning that i want to minimize f(x) using only one dimension of $x$. One way is to solve the optimization for each dimension $(x_i)$ separately and take the best one with smallest $f(x_i)$.
But based on some intuition, for an initial point $x$ i check $$\{i:\nabla_{x_{i}} f(x) <0\}$$ and then would choose the dimension $i$ from the above set which has smallest $\nabla_{x_{i}} f(x)$ and then solve the optimization only for that dimension. I noticed if $f(x)$ is convex the result is optimum.
But i really like to know the mathematical logic behind it! as:
1-why should $\nabla_{x_{i}} f(x)$ be negative?
2- why to take the dimension with the smallest $\nabla_{x_{i}} f(x)$ in $x$?
Is it because $f(x) \ge 0$? or also the fact that i look for $x \ge 0$ solutions?
First of all, minimum does not exist only infimum exists. This is because the problem is separable, hence we can choose arbitrarily small positive value of some component $x_i$ by setting other components to zero. So the infimum would be zero. Regarding your questions, i dont see why your method works, but i try to answer generically.
I think you should consider this set $\{i:-\nabla_{x_{i}} f(x) <0\}$ also observe that gradients are always positive. The first question is due to that fact that negative gradient pushes you to decrease the objective. Since your $x\geq 0$ the gradient would be always positive, hence negative gradient pushes towards all components being zero. Similar to coordinate descent, you function can be shown to decrease the objective the most if you choose the gradient which is very high i.e negative gradient is smallest thus results in highest decrement of objective.