Does the method of steepest decent always move in an orthogonal direction between iterations?

Question

Does the method of steepest decent always move in an orthogonal direction between iterations?

600 Views Asked by Bumbble Comm At 10 May 2026 - 9:55

I understand everything, I think, about the method but the result (or requirement) that successive steps are orthogonal to each other.

So, with the formula for this algorithm as:

$$\mathbf{x}_{n+1}=\mathbf{x}_n-\gamma_n \nabla F(\mathbf{x}_n)$$

I understand the role of $\gamma$ (a factor that controls your step size, but also direction?), and the idea of the gradient (the steepest direction of descent in the objective function space), but not why the zig-zag pattern is so common - or it is necessary as a result of the algorithm?

I've seen it stated that $\gamma_n$ should be chosen so that $\nabla F(\mathbf{x}_{n+1})$ and $\nabla F(\mathbf{x}_n)$ are orthogonal, but I am unsure as to why....

Thanks.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2022-05-05 18:52:54

Consider the Taylor series up to second order of \begin{align} F\big(\boldsymbol x_{n+1} \big) &= F\big(\boldsymbol x_{n} - \gamma_n \nabla F(\boldsymbol x_{n} ) \big) \\ &=F(\boldsymbol x_{n} ) - \gamma_n \nabla F(\boldsymbol x_{n} )^T \nabla F(\boldsymbol x_{n} ) + \frac12 \gamma_n ^2 \nabla F(\boldsymbol x_{n} )^T \nabla^2 F(\boldsymbol x_{n} ) \nabla F(\boldsymbol x_{n} ) + \mathcal O \Big( \Vert \nabla F(\boldsymbol x_{n} ) \Vert^3 \Big). \end{align} For fixed $\boldsymbol x_{n} $ this is a univariate, scalar function in $\gamma_n$. Neglecting the $\mathcal O \Big( \Vert \nabla F(\boldsymbol x_{n} ) \Vert^3 \Big)$ term gives a quadratic function in $\gamma_n$ termed $$ h(\gamma_n) :=F(\boldsymbol x_{n} ) - \gamma_n \nabla F(\boldsymbol x_{n} )^T \nabla F(\boldsymbol x_{n} ) + \frac12 \gamma_n ^2 \nabla F(\boldsymbol x_{n} )^T \nabla^2 F(\boldsymbol x_{n} ) \nabla F(\boldsymbol x_{n} ) $$ with a unique minimum (recall that we perform gradient descent to minimize $F$) attended at $\gamma_n^\star$ where $h'(\gamma_n^\star) = 0$. Introducing the shortcuts $\boldsymbol g_n := \nabla F(\boldsymbol x_{n} )$, $H_n := \nabla^2 F(\boldsymbol x_{n} )$ one obtains $$\gamma_n^\star = \frac{\boldsymbol g_n^T \boldsymbol g_n}{\boldsymbol g_n^T H_n \boldsymbol g_n}.$$

If you no go ahead and use this (up to second order optimal $\gamma_n^\star$) in the update formula for $\boldsymbol x_{n+1}$, $$\boldsymbol x_{n+1} = \boldsymbol x_{n} - \frac{\boldsymbol g_n^T \boldsymbol g_n}{\boldsymbol g_n^T H_n \boldsymbol g_n} \boldsymbol g_n$$ you can again Taylor expand (this time only up to first order) $$ \nabla F\big( \boldsymbol x_{n+1} \big) = \nabla F \Bigg( \boldsymbol x_{n} - \frac{\boldsymbol g_n^T \boldsymbol g_n}{\boldsymbol g_n^T H_n \boldsymbol g_n} \boldsymbol g_n \Bigg) \approx \nabla F\big( \boldsymbol x_n \big) - \frac{\boldsymbol g_n^T \boldsymbol g_n}{\boldsymbol g_n^T H_n \boldsymbol g_n} \nabla^2 F\big( \boldsymbol x_n \big) \boldsymbol g_n. $$ Then take a look at the scalar product \begin{align} \nabla F\big( \boldsymbol x_{n} \big)^T \nabla F\big( \boldsymbol x_{n+1} \big) =& \nabla F\big( \boldsymbol x_{n} \big)^T\Bigg[\nabla F\big( \boldsymbol x_n \big) - \frac{\boldsymbol g_n^T \boldsymbol g_n}{\boldsymbol g_n^T H_n \boldsymbol g_n} \nabla^2 F\big( \boldsymbol x_n \big) \boldsymbol g_n \Bigg] \\ =&\boldsymbol g_n^T \boldsymbol g_n - \frac{\boldsymbol g_n^T \boldsymbol g_n}{\boldsymbol g_n^T H_n \boldsymbol g_n} \boldsymbol g_n^T H_n \boldsymbol g_n = 0 \end{align}

To summarize, for the optimal step length $\gamma_n^\star$ based on a quadratic approximation, the gradients of two successive steps are orthogonal (up to first order).

Does the method of steepest decent always move in an orthogonal direction between iterations?

There are 1 best solutions below

Related Questions in ALGORITHMS

Related Questions in DESCENT

Trending Questions

Popular # Hahtags

Popular Questions