Why does the gradient descent direction end up being the direction of the eigenvector corresponding to the smallest eigenvalue of the Hessian

1.2k Views Asked by At

I read that the direction of gradient descent when optimising a function $f(w)$ after several iterations will be the same as the direction of the eigenvector corresponding to the smallest eigenvalue of the Hessian $H$. How can we prove this is the case?

Edit: For this to apply there is an assumption that the smallest eigenvalue is sufficiently smaller than the second smallest.