Gradient descent versus finding where the gradient vanishes via solving systems of equations

91 Views Asked by At

I started learning machine learning and got stuck at the following questions:

  1. Why do we need to iterate the gradient descent algorithm?

  2. Why don't we equate the gradient to zero and find all local minima?

Most likely, we can't reach the minimum; we can just come as close as possible and the learning rate controls how close. Am I right? Or do I miss something?

Sorry if this is a duplicate question. Thanks in advance.