I started learning machine learning and got stuck at the following questions:
Why do we need to iterate the gradient descent algorithm?
Why don't we equate the gradient to zero and find all local minima?
Most likely, we can't reach the minimum; we can just come as close as possible and the learning rate controls how close. Am I right? Or do I miss something?
Sorry if this is a duplicate question. Thanks in advance.