These slides give an overview of some results in nonconvex optimization with gradient descent (GD). They suggest a few types of results that are proven about nonconvex GD:
- Convergence to a local minimum, which can be guaranteed if there are no saddle points, for instance.
- Local convergence to a global minimum, which can be guaranteed if the starting point is somehow "close" to the global minimum.
- Global convergence to a global minimum, which can be guaranteed if all stationary points are global minima.
I have found plenty of results in the literature of the form (1) and (3), but I can't find any examples of a proof that shows convergence to the global minimum for suitable initialization when there are suboptimal local minima. Can anyone give me a pointer to such a result so I can get an idea of what techniques might be helpful in this case? The result does not need to be general—a specific application would also be interesting to see what techniques are used.