In the method of gradient descent, which in each iteration we reduce the problem to a single variable optimization problem in order to find $\alpha$ (called the learning rate?) using the update rule:
$$\mathbf{x}_{t+1} = \mathbf{x}_t-\alpha\nabla f(\mathbf{x}_t)$$
Drawbacks:
computational: we must evaluate the gradient, of at least estimate it using finite differences.
order of convergence, due to the fact that $$\nabla f (\mathbf{x}_{t+1})^T\nabla f(\mathbf{x}_t) = 0$$ we will obtain a "zig-zag" pattern which will increase the number of iterations.
When will the gradient descent do a $90$ degrees turn? On the one hand we see that
$$\nabla f (\mathbf{x}_{t+1})^T\nabla f(\mathbf{x}_t) = 0$$
On the other hand we know that the gradient is perpendicular to the level set?
I see that in some case we start with a random starting point and set a learning rate $\alpha$ and in other cases, we just start with staring point, and find $\alpha$ using line search methods, is this the same optimization method?
Technically they belong to the same optimization method: namely that of steepest descent or gradient descent.
The strategy in line search is to evaluate $f(\mathbf{x}-\alpha\nabla_x f(\mathbf{x})$ for several values of $\alpha$ and choose the one that results in the smallest objective function value, while the more popular approach is to set $\alpha$ to a small constant.
Line search and using a (constant) learning rate are thus two different strategies but belong to the same optimization method.