Gradient Descent Divergence

1k Views Asked by At

I am curious about divergence conditions of the Gradient Descent method. In the reading that I have done I have only ever seen it mentioned that this method diverges when $\alpha$ is too large. It seems obvious to me though that the method could also diverge on an equation such as ${-x^3\over3}+x$ for any initial guess $x$ such that $x>1$. If the initial $x$ is greater than $1$ then the gradient descent method will always push you farther from the min regardless of what $\alpha$ you select because the slope you get from the derivative will always be negative which means you will always be increasing the value of your previous guess.

I am curious about this in the context of minimizing the error function of a neural network. When you do this you don't always have a graph of the function to look at to help identify what to pick as an initial guess. How is it possible to identify whether your minimization is diverging because you have selected an $\alpha$ that is too large or an initial guess that is in the wrong spot?