Gradient descent converges after $ T \leq O(\log(1/\varepsilon))$ iterations for $f(x)=x^T A x$ where $A$ is positive definite matrix

242 Views Asked by Bumbble Comm At 16 May 2026 - 7:13

Let $f(x)=x^T A x$, where $A$ is a positive definite matrix ($A \succ 0$).

I need to show that there exists a step size $\eta$ such that $ T \leq O(log(1/\varepsilon))$ iterations of gradient descent are sufficient to achieve $\varepsilon$-optimal solution (i.e. $f(w_T)\leq\varepsilon$ ).

I have proven the following:

$\nabla f(x)=2A$, so the next step for gradient descent is $w_{t+1}=w_t-\eta\cdot\nabla f(w_t)=w_t-\eta \cdot2Aw_t=(I-2\eta A)w_t$
Let $B=(I-2\eta A)$
Let $\lambda > 0$ be eigenvalue of $A$ with the corrseponding eigenvector $v\neq0$. So $Bv=(I-2\eta A)v=Iv-2\eta Av=v-2\eta \lambda v=(1-2\eta\lambda)v$. Hence, $1-2\eta\lambda$ is eigenvalue of $B$ with eigenvector $v$.
Choose $\eta$ such that $0<1-2\eta\lambda<1$, or equivalently $\eta < \frac{1}{2\lambda}$
Notice that if $v$ is eigenvector of $A$, so $f(v)=v^TAv=v^T\lambda v =\lambda \lVert v \lVert^2$
$f(x)=x^TAx$ is convex, so $f(\lambda x)\leq \lambda f(x)$.
Now, set $w_1$ (starting point) for the gradient descent be an eigenvector of $A$ with the corresponding eigenvalue $\lambda $.
As shown above, that if $w_1$ is an eigenvector of $A$ with eigenvalue $\lambda$, it follows that $1-2\eta\lambda$ is an eigenvalue of $B$. So inductivly $w_{t+1}=Bw_t=(1-2\eta\lambda)^t w_1$.
Now, using all this information, we get

$\begin{align*} f(w_{t+1})=f(Bw_t)=f((1-2\eta\lambda)^t w_1) \leq (1-2\eta\lambda)^t \cdot f(w_1)=(1-2\eta\lambda)^t \lambda \lVert v \lVert^2 \end{align*}$

We want $f(w_{t+1}) \leq \varepsilon$, so $(1-2\eta\lambda)^t \lambda \lVert w_1 \lVert^2 \leq \varepsilon$

Now after solving for $t$, I got

$\begin{align*} t \geq \frac{1}{\ln{(1-2\eta \lambda)}}\cdot -\ln{(\frac{\lambda\cdot \lVert w_1 \lVert^2}{\varepsilon})} \end{align*}$

The problem is that $1-2\eta \lambda \leq 1$ so $\ln{(1-2\eta \lambda)}<0$ and it breaks the ineqality.

If anybody could help me finding the mistake or solving this problem.

Notice: I need to show that without using Lipschitz property.

Thank you in advance.

Original Q&A

There are 1 best solutions below

Bumbble Comm On 16 Apr 2021 - 9:58 BEST ANSWER

This is solved. It came to my mind that

$$ t \geq \frac{-\ln{(\frac{\lambda\cdot \lVert w_1 \lVert^2}{\varepsilon})}}{\ln{(1-2\eta \lambda)}}$$

On the RHS we have a positive number (because $0 <(1-2\eta \lambda) < 1$ so $\ln{(1-2\eta \lambda)<0}$.

This means that it is enough to have $T=O(\log(1/\varepsilon))$ steps to get to $\varepsilon$-accuracy (and of course any more steps will give us a better bound, but it still works).

Hope it helps someone.

Gradient descent converges after $ T \leq O(\log(1/\varepsilon))$ iterations for $f(x)=x^T A x$ where $A$ is positive definite matrix

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in CONVEX-OPTIMIZATION

Related Questions in QUADRATIC-FORMS

Related Questions in NUMERICAL-OPTIMIZATION

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions