Slow convergence of gradient descent for a strictly convex quadratic

Question

Slow convergence of gradient descent for a strictly convex quadratic

440 Views Asked by Bumbble Comm At 11 May 2026 - 3:13

Let $0 < \lambda_1 \leq \lambda_2 \leq \ldots \leq \lambda_n$ and let $f: \mathbb{R}^n \to \mathbb{R}$ define by

$$ f(x) = \frac{1}{2}x^TMx $$ where M is \begin{bmatrix} \lambda_1 & 0 & \dots \\ 0 & \lambda_2 & 0 & \dots \\ \vdots & 0 & \ddots & 0 & \dots \\ & \vdots \\ & & & & \lambda_n \end{bmatrix}

To use gradient descent with exact minimization we define $t_k = \frac{||\nabla f(x)||^2}{\nabla f(x)^TM\nabla f(x)}$ and $x_{k+1} = x_k - t_k \nabla f(x_k)$.

Given $x_0 = [{\lambda_1}^{-1},0, \ldots ,0, {\lambda_n}^{-1}]^T$ I would like to show that. $$ f(x_{k+1}) = \bigg(\frac{\lambda_n - \lambda_1}{\lambda_n + \lambda_1}\bigg)^2 f(x_k)$$.

I have tried both an induction argument and just expanding $f(x_{k+1})$ to get the result but I have not had any success. If someone can see how to complete the proof I would be really interested to see.

Original Q&A

There are 3 best solutions below

**Bumbble Comm** · Answer 1 · 2019-02-04 02:28:32

Let $M$ be a symmetric positive definite matrix. If $f(x)=\frac{1} {2}x^{\intercal}Mx$, then $\nabla f(x)=Mx$. The gradient descent iteration for $f$ is $$ x_{k+1}\equiv x_{k}-t_{k}\nabla f(x_{k})=\left(I-t_{k}M\right)x_{k}. $$ At each step, we would like to pick $t_{k}$ such that $$ f(x_{k+1})=\frac{1}{2}\left(M^{\frac{1}{2}}\left(I-t_{k}M\right)x_{k}\right)^{\intercal}\left(M^{\frac{1}{2}}\left(I-t_{k}M\right)x_{k}\right) $$ is minimized. Differentiating, $$ \frac{\partial}{\partial t_{k}}\left[f(x_{k+1})\right]=t_{k}x_{k}^{\intercal}M^{3}x_{k}-x_{k}^{\intercal}M^{2}x_{k}. $$ Setting this to zero and solving for $t_k$ we get $$ t_{k}=\frac{x_{k}^{\intercal}M^{2}x_{k}}{x_{k}^{\intercal}M^{3}x_{k}}=\frac{\left(Mx_{k}\right)^{\intercal}\left(Mx_{k}\right)}{\left(Mx_{k}\right)^{\intercal}M\left(Mx_{k}\right)}=\frac{\left\Vert \nabla f(x_{k})\right\Vert ^{2}}{\nabla f(x_{k})^{\intercal}M\nabla f(x_{k})} $$ (the same exact expression in your original question). Note that the above is well-defined since $x_{k}^{\intercal}M^{3}x_{k}=(Mx_{k})^{\intercal}M(Mx_{k})>0$ by the positive definiteness of $M$. Substituting the expression for $t_{k}$ back into $f(x_{k+1})$, $$ f(x_{k+1})=\frac{1}{2}\left(x_{k}^{\intercal}Mx_{k}-\frac{\left(x_{k}^{\intercal}M^{2}x_{k}\right)^{2}}{x_{k}^{\intercal}M^{3}x_{k}}\right)=f(x_{k})-\frac{1}{2}\frac{\left(x_{k}^{\intercal}M^{2}x_{k}\right)^{2}}{x_{k}^{\intercal}M^{3}x_{k}}. $$ Or, more succinctly, $$ \boxed{f(x_{k+1}) = f(x_k) - \frac{(f(M^{\frac{1}{2}} x_k))^2}{f(M x_k)}}. $$

Now, take $M=\text{diag}(2,3,5)$ and $x_{0}=(1/2, 1/3, 1/5)^\intercal$. Then, $f(x_{0})=31/60$ and $f(x_{1})=1/15$. However, $$\left(\frac{5-2}{5+2}\right)^{2} \frac{31}{60} \neq \frac{1}{15}.$$ Therefore, I don't think your claim is true. Perhaps you meant something else?

**Bumbble Comm** · Answer 2 · 2019-02-04 03:25:32

It's a standard result that $f(x_{k+1}) \leq \left( \frac{\lambda_{n}-\lambda_{1}}{\lambda_{n}+\lambda_{1}} \right)^{2} f(x_{k})$. However, this is an inequality, not an equation. What actually happens depends on exactly where you start the sequence.

The usual proof of this is based on the Kantorovich inequality. You can find it in many textbooks on optimization, e.g. Luenberger and Ye, Linear and Nonlinear Programming, 4th ed., Springer, 2015.

**Bumbble Comm** · Answer 3 · 2019-02-04 09:35:20

Your desired claim is true if you choose $$ x_0 = [\lambda_1^{-1}, \mathbf{0, \ldots, 0}, \lambda_n^{-1}]^\top.$$ With this modified initial condition, the result follows from induction.

Btw. the proof of the upper bound via Kantorovich's lemma also entails that equality holds in this case of interest.

Slow convergence of gradient descent for a strictly convex quadratic

There are 3 best solutions below

Related Questions in OPTIMIZATION

Related Questions in CONVEX-OPTIMIZATION

Related Questions in NONLINEAR-OPTIMIZATION

Related Questions in NUMERICAL-OPTIMIZATION

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions