About the gradient descent update rule

200 Views Asked by Bumbble Comm At 17 May 2026 - 6:05

When deriving the gradient descent algorithm the following function comes up $$ f(x) + \nabla f(x)^T(z-x) + \frac{1}{2t}\| z-x\|^2_2 $$ that when minimizing leads to the update rule $x - t \nabla f(x)$. Where the term $\frac{1}{2t}\| z-x\|^2_2$ comes from?

I think it comes from a Taylor expansion:

$$ f(z) = f(x) + \nabla f(x)^T(z-x) + \frac{1}{2}(z-x)^T\nabla^2 f(\theta (x-z) + z)(z-x), $$ where $0 \leq \theta \leq 1$, but for this equation to match the previous one, it would be required to write $$ \nabla^2 f(\theta (x-z) + z) = \frac{1}{t} I, $$ where $I$ is the identity matrix.

Original Q&A

There are 1 best solutions below

Bumbble Comm On 17 Jul 2021 - 11:42

You are indeed right about the origin of that term, see also the explanation provided here on paragraph 5.1:

http://www.stat.cmu.edu/~ryantibs/convexopt-S15/scribes/05-grad-descent-scribed.pdf

About the gradient descent update rule

There are 1 best solutions below

Related Questions in CONVEX-OPTIMIZATION

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions