For what class of functions is steepest descent guaranteed to converge in finite number of iterations?

472 Views Asked by At

Is there any classification of $\mathbb{R}^n \rightarrow \mathbb{R}$ functions with respect to finite convergence of steepest descent (and similar first-order methods, e.g., nonlinear conjugate gradients)?

By finite convergence I mean that algorithm reaches a point $x_k$ with $||\nabla f(x_k)|| = 0$ after a finite number of iterations, given that no round-off errors occur (and maybe assuming that line search is exact).

It is known that algorithms are globally convergent, i.e. $lim_{k \rightarrow \infty} ||\nabla f(x_k)|| = 0$, but there is no guarantee that it happens for finite $k$.

For example, conjugate gradient method is guaranteed to converge in $n$ steps for positive definite quadratic functions. Are there any similar results for steepest descent?

What about (strongly) convex general functions? I was only able to find results of the type \begin{equation} k = O\left(\frac{||x_0 - x^*||^2}{\epsilon}\right)\end{equation} to achieve $f(x_k) - f(x^*) \leq \epsilon$, where $f$ is smooth convex and $x^*$ is a stationary point.

As far as I understand, it does not imply finite convergence for $\epsilon = 0$. Is it known that finite convergence does not happen even for strongly convex functions?

1

There are 1 best solutions below

0
On

Steepest descent method:

$$x_{k+1} := x_k − \gamma_k \nabla f (x_k )$$ with

$$\gamma_k := \arg \min_{\gamma \geq 0} \phi_k (\gamma)$$ and

$$\phi_k (\gamma) = f (x_k − \gamma \nabla f (x_k ))$$

This method also ends after finally many steps for quadratic functions $f(x):=\frac{1}{2} x^T A x- b^Tx $ with $A$ positive definite and symmetric with the following choice for $\gamma_k$:

$$\gamma_k := \frac{1}{\lambda_{k+1}} (k=0,1,\ldots,n-1) $$ with $\lambda_i$ being Eigenvalues of $A$.

However, this is not used in practice because calculating the eigenvalues of $A$ is usually computationally much harder than solving $Ax=b$ (solving $Ax=b$ corresponds to setting the gradient of $f(x)$ zero and thus finding the critical points. Since $A$ is positive definite a critical point will be optimal).

There are additional conditions which will make steepest descent terminate in finitely many steps, which I will not be listing. You can, however, find some additional information in

Langlois, W. E. "Conditions for termination of the method of steepest descent after a finite number of iterations." IBM Journal of Research and Development 10.1 (1966): 98-99.