If the gradient descent flow is a monotone decreasing function, why isn't my training loss monotonic?

480 Views Asked by At

The gradient descent flow has the property that the loss is a monotone decreasing function. My training loss increases and decreases along the trajectory. Why does this happen?

training loss vs time

1

There are 1 best solutions below

2
On

Gradient flow is gradient descent with "infinitesimal step sizes." When you actually perform gradient descent you of course use steps which are some non-infinitesimal size, and these steps can "overshoot" what gradient flow would theoretically do and end up increasing the loss function. You can try to compensate for this by making your step sizes smaller but this comes at the cost of training taking longer.