Is there a name for functions where the global minimum can be found by gradient descent from any point? (Not convexity)

84 Views Asked by At

I'm trying to find names/terms for a property that seems like it ought to be important.

In machine learning, a lot of approaches rely on gradient descent. Often we know very little about the stationary points of the function being optimized, and when a minimum is found, we have no idea if it's a global minimum. We may pick several random starting points and run gradient descent and pick the best value discovered. Occasionally, we know that a function is convex, and then we have both the option of using other algorithms, and we have an assurance that when an optimum is found, it is a global optimum.

But there's a class of functions for which I do not know the name, which are not necessarily convex, but where the only minimum is the global minimum, and a gradient method with small enough step sizes will always arrive there eventually. An example is the 2D Rosenbrock function, which is non-convex because of a parabola-shaped valley. But from any starting point, there exists a sequence of step sizes, such that stepping in the (negative) gradient direction leads to the global minimum. Appealing to physical intuitions, a drop of water running over the landscape always leads to the same sink (if we disregard the momentum of the drop).

I think another way of describing the property is in terms of sublevel sets:

  • each sublevel set of the function is path connected
  • for a < b, the sublevel set for a is a subset of the sublevel set for b, $$a < b \Rightarrow L^-_a \subset L^-_b$$

For this reason, every step which decreases the function value can be viewed as picking out a smaller set of values which is upper-bounded by the best value found so far, and a gradient-descending path cannot get "trapped" in an isolated valley. But note, this description includes sublevel sets that can be highly non-convex, and in these cases the gradient-descending path may be circuitous.

It seems like this property ought to be important in mathematical optimization because it lets us know that gradient descent can always find its way to the global optima though perhaps slowly. Further, even for functions that don't have this property globally, the region of the domain that "flows" to any local minimum can be described with this property (i.e. even in that region it will not in general be convex, but stuff will flow to that point, like water in a river basin). But I'm having difficulty finding explicit discussion of this property on its own. Instead, I just find remarks which note that for convex functions, all local minima are global minima.