When minimizing a function $f$ numerically, we may use the gradient descent method and update the current guess $x$ by replacing it with $$y=x-\gamma\nabla f(x)\tag1.$$ Now, in my application, $\nabla f(x)$ is quite small and it's hard for me to come up with a suitable choice for the step size $\gamma$. What should I do or can I do?
In particular, is it a stupid idea to replace $(1)$ by $$y=x-\gamma\frac{\nabla f(x)}{\|\nabla f(x)\|}\tag2?$$ Using this, the gradient only specifies the direction and we can precisely control the difference of the update.