Proof of orthogonality in the gradient descend algorithm.

939 Views Asked by Bumbble Comm At 27 Mar 2026 - 7:14

Ok, this is perhaps an easy question but I'm stuck, so any help will be cherished.

The gradient descent algorithm updates the weights as:

$$\textbf{w}_{t+1} = \textbf{w}_{t} - \eta\nabla E(\textbf{w}_{t}) $$

for a function $E(\textbf{w})$ to minimize.

I have read, but I can't prove it, that one of the reasons (among others) that make inefficient this algorithms is that makes a zigzag path of descent because $\nabla E(\textbf{w}_{t+1})^{T}\nabla E (\textbf{w}_{t}) = 0 $.

Why is this true?

I think if I can understand this I would understand better the classical momentum and Nesterov techniques.

Thanks.

Original Q&A

Proof of orthogonality in the gradient descend algorithm.

Related Questions in LINEAR-ALGEBRA

Related Questions in OPTIMIZATION

Related Questions in NEURAL-NETWORKS

Trending Questions

Popular # Hahtags

Popular Questions