Normalized steepest descent

1.1k Views Asked by Bumbble Comm At 10 May 2026 - 3:49

In machine learning, I learnt that the update rule of (unnormalized) gradient descent is defined as

$$x^{(t+1)} = x^{(t)} + \alpha \nabla f(x^{(t)})$$

If I want it normalized I would simply divide the gradient vector by its norm

$$x^{(t)} = x^{(t+1)} + \alpha \dfrac{\nabla f(x^{(t)})}{\|\nabla f(x^{(t)})\|}$$

Now in convex optimization, I am reading that normalized steepest descent is defined as

$$x^{(t+1)} = x^{(t)} + \alpha \triangle_{\text{nsd}}$$

where $\triangle_{\text{nsd}} := \min \left\{ \nabla f(x)^Tv \mid \|v\| = 1 \right\} $.

Question: If this is correct, then

$$\dfrac{\nabla f(x^{(t)})}{\|\nabla f(x^{(t)})\|} = \min\{\nabla f(x)^Tv \mid \|v\| = 1\} $$

Does that simply mean that the vector $v$ simply 1 divided by the norm of $\triangledown f(x^{(t)})$ ?

Original Q&A

There are 3 best solutions below

Bumbble Comm On 05 Apr 2018 - 12:51 BEST ANSWER

No. Indeed the Optimization world and Machine Learning world use different approaches in order to normalize the direction of the Steepest Descend.

Moreover, in the Machine Learning world we usually use the $ {L}_{2} $ Steepest Descent (Also known Gradient Descent) while in Optimization this can be expanded to other norms.

Have a look at my PDF - Steepest Descent by Norms.

Bumbble Comm On 05 Apr 2018 - 12:30

No, it does not imply that equality. There are many ways of updating the features, which are motivated by gradient descent, here's a small list

ADAGRAD
ADADELTA
RMSprop, this one is interesting, it was proposed by Geoff Hinton in a lecture, and became very popular after
ADAM
NADAM

Bumbble Comm On 19 Aug 2021 - 8:54

Normalized steepest descent

There are 3 best solutions below

Related Questions in OPTIMIZATION

Related Questions in CONVEX-OPTIMIZATION

Related Questions in NUMERICAL-OPTIMIZATION

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions