Gradient descent via polynomial approximation

346 Views Asked by Bumbble Comm At 27 Mar 2026 - 5:05

It seems that most proofs of convergence for gradient descent algorithms rely on strong conditions on the first and second derivatives of the function, for instance that $$|f''(x)| \leq K$$ over the whole domain of the function. My question is are there results for gradient descent type algorithms when we can only say something like $$|f^{(n)}(x)| \leq K$$ for some $n > 2$?

Convergence in this case can refer to several different sequences, i.e convergence of the parameters $x_{k}$, of function values $f(x_{k})$ or derivatives $f'(x_{k})$

Original Q&A

There are 1 best solutions below

Bumbble Comm On 04 Oct 2016 - 5:40

Likely, the reason that these proofs are bounding the second derivative is that proofs of convergence for Newton's method require that the derivative of the function be Lipschitz continuous. An everywhere differentiable function $f:\mathbb{R}\rightarrow\mathbb{R}$ is Lipschitz continuous if and only if its derivative is bounded. Basically, if we need our derivative to be Lipschitz, we can guarantee that by bounding the second derivative.

Anyway, Lipschitz continuity states $$ |f(x)-f(y)| \leq \gamma |x - y| $$ in a single dimension. From this, we can prove $$ |f(y)-f(x)-f^\prime(x)(y-x)|\leq\frac{\gamma(y-x)^2}{2} $$ Which is really what we need for proving that Newton's method converges since it bounds how good the affine approximation $f(x)+f^\prime(x)(y-x)$ is to $f(y)$. Newton's method is really just a series of these affine approximations. In any case, the best place to see this and how the proof works is Dennis and Schnabel's book "Numerical Methods for Unconstrained Optimization and Nonlinear Equations". They have both the proof for Newton's method in a single dimension as well as multiple. Especially when going to multiple dimensions, it's much easier to work with Lipschitz continuity. Start on page 21, though, for the single dimensional case.

As a side note, if we use a line-search method, the Lipschitz continuity condition comes up again. Really, it all comes down to guaranteeing that our affine approximations are good enough. To see how it manifests, Nocedal and Wright's book "Numerical Optimization" runs through the proof on page 38.

Gradient descent via polynomial approximation

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in NONLINEAR-OPTIMIZATION

Related Questions in NUMERICAL-OPTIMIZATION

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions