In machine learning, I learnt that the update rule of (unnormalized) gradient descent is defined as
$$x^{(t+1)} = x^{(t)} + \alpha \nabla f(x^{(t)})$$
If I want it normalized I would simply divide the gradient vector by its norm
$$x^{(t)} = x^{(t+1)} + \alpha \dfrac{\nabla f(x^{(t)})}{\|\nabla f(x^{(t)})\|}$$
Now in convex optimization, I am reading that normalized steepest descent is defined as
$$x^{(t+1)} = x^{(t)} + \alpha \triangle_{\text{nsd}}$$
where $\triangle_{\text{nsd}} := \min \left\{ \nabla f(x)^Tv \mid \|v\| = 1 \right\} $.
Question: If this is correct, then
$$\dfrac{\nabla f(x^{(t)})}{\|\nabla f(x^{(t)})\|} = \min\{\nabla f(x)^Tv \mid \|v\| = 1\} $$
Does that simply mean that the vector $v$ simply 1 divided by the norm of $\triangledown f(x^{(t)})$ ?
No. Indeed the Optimization world and Machine Learning world use different approaches in order to normalize the direction of the Steepest Descend.
Moreover, in the Machine Learning world we usually use the $ {L}_{2} $ Steepest Descent (Also known Gradient Descent) while in Optimization this can be expanded to other norms.
Have a look at my PDF - Steepest Descent by Norms.