Trying to parse this solution
$$ \text{minimize} \sum_{i = 1} ^ r (\sigma_i x_i - b_i)^2 + \sum_{i = r + 1}^m b_i^2\\ \text{subject to } \sum_{i = 1}^n x_i^2 = \gamma $$
Convex Optimization by Boyd offers this solution :
"Although the problem is not convex, it is clear that a necessary and sufficient condition for a feasible $x$ to be optimal is that either the gradient of the objective vanishes at x, or the gradient is normal to the sphere through x, and pointing toward the interior of the sphere. In other words, the optimality conditions are that $||x||_2^2 = \gamma$ and there exists a $\nu \geq 0$, such that $$(\sigma_i^2 + \nu)x_i = \sigma_i b_i, \, i = 1, ..., r, \,\,\, \nu x_i = 0, i = r + 1, ..., n$$"
I don't get why, if it is not a convex function, that the optimal solution is where the gradient vanishes. How would you know it is not just a local minimum, or even a maximum, when the gradient vanishes?
Also, can anyone explain the intuition behind: "normal to the sphere through $x$ pointing toward the interior of the sphere"