Normalized steepest descent

1.1k Views Asked by At

In machine learning, I learnt that the update rule of (unnormalized) gradient descent is defined as

$$x^{(t+1)} = x^{(t)} + \alpha \nabla f(x^{(t)})$$

If I want it normalized I would simply divide the gradient vector by its norm

$$x^{(t)} = x^{(t+1)} + \alpha \dfrac{\nabla f(x^{(t)})}{\|\nabla f(x^{(t)})\|}$$

Now in convex optimization, I am reading that normalized steepest descent is defined as

$$x^{(t+1)} = x^{(t)} + \alpha \triangle_{\text{nsd}}$$

where $\triangle_{\text{nsd}} := \min \left\{ \nabla f(x)^Tv \mid \|v\| = 1 \right\} $.

Question: If this is correct, then

$$\dfrac{\nabla f(x^{(t)})}{\|\nabla f(x^{(t)})\|} = \min\{\nabla f(x)^Tv \mid \|v\| = 1\} $$

Does that simply mean that the vector $v$ simply 1 divided by the norm of $\triangledown f(x^{(t)})$ ?

3

There are 3 best solutions below

0
On BEST ANSWER

No. Indeed the Optimization world and Machine Learning world use different approaches in order to normalize the direction of the Steepest Descend.

Moreover, in the Machine Learning world we usually use the $ {L}_{2} $ Steepest Descent (Also known Gradient Descent) while in Optimization this can be expanded to other norms.

Have a look at my PDF - Steepest Descent by Norms.

0
On

No, it does not imply that equality. There are many ways of updating the features, which are motivated by gradient descent, here's a small list

  • ADAGRAD

  • ADADELTA

  • RMSprop, this one is interesting, it was proposed by Geoff Hinton in a lecture, and became very popular after

  • ADAM

  • NADAM

0
On

To minimize $\nabla f(x) \cdot v$ subject to the constraint that $\| v \|_2 = 1$, we should take $$ v = - \nabla f(x) / \| \nabla f(x) \|_2. $$ That explains why your third equation is equivalent to your second equation, when $\| \cdot \|$ is the $\ell_2$-norm.