Using gradient descent: cost decreases, then increases

2.4k Views Asked by At

I am minimizing a function using gradient descent. The learning rate is fixed. First, for few iterations, the cost decreases; after that, it starts increasing. What is the reason for this?