Convergence rate (to a local minimum) of gradient descent for a $C^3$ function

328 Views Asked by At

Suppose we have a $f \in C^{3} (U)$ where $U$ is an open subset (not necessarily convex) of $\mathbb R^n$ and for some $x_0 \in U$, the sublevel set $S = \{x: f(x) \le f(x_0)\}$ is compact. Further suppose the set of local minimizers of this function is not empty and every minimizer is locally strict, i.e., for any local minimizer $x^*$, there is an open neighborhood $V \subseteq U$ of $x$, such that $f(y) > f(x)$ for all $y \in V$ and $y \neq x$. Further assume the Hessian of all local minimizers is positive definite.

I am interested to know if we start at $x_0$, the convergence rate of the sequence $\{x_k\}$ generated by gradient descent with exact line search, i.e., \begin{align*} x_{k+1} = x_k - \gamma_k \nabla f(x_k), \qquad \gamma_k = \text{argmin}_{\gamma \ge 0} f(x_k - \gamma \nabla f(x_k) ). \end{align*} We only consider this as a thought experiment and so we assume in every iteration, the subproblem can be perfectly solved. This is a discussion here on the conditions to guarantee convergence for a differentiable convex function. I am putting stronger conditions and interested to know the convergence rate.

First, on the compact set $S$, $\|\cdot\| \circ \nabla^2 f: U \to \mathbb R$ is continuous and achieves maximum, so the gradient is Lipschitz. By results from convex optimization, the convergence rate should be at least $\frac{1}{k}$. Secondly, suppose $x_k \to x^*$ (might need to take a subsequence, but I think my assumptions should be enough to guarantee convergence) and by assumption every minimizer has the property $\nabla^2 f(x^*) \succ 0$. Let $F = \lambda_1 \circ \nabla^2 f$ where $\lambda_1$ is the function taking the smallest eigenvalue of a symmetric matrix. $F$ should be continuous and so there must be a neighborhood $W$ of $x^*$ s.t., $F(y) > \delta > 0$ for all $y \in W$. Then it seems like the restriction $f|_W$ should be convex and indeed strongly convex, and gradient descent should converge linearly in $W$.

Is my argument correct above? And is there more we can say about the convergence of sequence $\{x_k\}$ before it entering the open set $W$?