Why while performing gradient descent we multiply the gradient vector with step size and not only the descent direction?

173 Views Asked by At

When we perform gradient descent we adjust the parameters in the parameter space just by multiplying the step size with the gradient of the function. My question is why we need to multiply the gradient with the lear rate/step size and not only the unit vector in the direction of descent? why dont we divide the gradient vector by its magnitude?