Proof that gradient descent algorithm escapes saddle points exponentially

41 Views Asked by At

https://arxiv.org/pdf/1705.10412.pdf

I was going through this paper, and understood the crux of it. But in appendix, the complete proof of it is given which was a bit tough for me mathematically.

So right now I am stuck at understanding remark A.2(page 13), was stuck while proving Theorem B.1(page17) and did not understand the bounds of partial derivative of $g(x_i, x_{i+1})$ in the lemma B.2.(page 18).

Please help me with all these and after that I want to continue main proof by myself.

1

There are 1 best solutions below

0
On

I mean your questions aren't easy to answer on an online forum (very open an vague), and there are a lot. Also the authors haven't been very particular about writing. Remark A.2 refers to a Theorem B.2 of which there is none, only a Lemma B.2, and they use the term connection region without defining what it means (so I assume they mean an open subset/region which contains the point of interest).

Remark A.2 refers to an $\epsilon$ second order stationary point which they define to mean your stationary point is second order concave (a minimum) within some $\epsilon$ tolerance. Thus Remark A.2 is simply saying that since there are bounds on the gradients, (they must be $ > \gamma\tau$) and thus there is a violation on the $\epsilon$ tolerance definition, since the gradients cannot be arbitrarily small in that region. Moreover they refer to the smallest eigenvalues of the Hessian and remember the Hessian tells the type of stationary point we have. If $\gamma$ is positive then $-2\gamma$ is negative, and so there is an implication that possibly the Hessian is negative definite (for example - I haven't read too closely into the paper). These are just things to notice (which is what we have when we have a remark).

"Stuck whilst proving Theorem B.1" is too open for me to answer.

"Did not understand the bounds in Lemma B.2" is hard for me to answer as well. What did you not understand? Tbh it seems a little forced of a Lemma. They say "there exists functions such that the following bounds are satisfied" and you need to see in the proof they pick very specific functions to satisfy these bounds - so if you only read the Lemma it would be very difficulty to see where those bounds come from without looking through the proof. They construct some $g$ based on some BCs.