If the gradient descent flow is a monotone decreasing function, why isn't my training loss monotonic?

480 Views Asked by Bumbble Comm At 28 Mar 2026 - 4:34

The gradient descent flow has the property that the loss is a monotone decreasing function. My training loss increases and decreases along the trajectory. Why does this happen?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 12 Nov 2020 - 11:40

Gradient flow is gradient descent with "infinitesimal step sizes." When you actually perform gradient descent you of course use steps which are some non-infinitesimal size, and these steps can "overshoot" what gradient flow would theoretically do and end up increasing the loss function. You can try to compensate for this by making your step sizes smaller but this comes at the cost of training taking longer.

If the gradient descent flow is a monotone decreasing function, why isn't my training loss monotonic?

There are 1 best solutions below

Related Questions in MONOTONE-FUNCTIONS

Related Questions in GRADIENT-DESCENT

Related Questions in NEURAL-NETWORKS

Trending Questions

Popular # Hahtags

Popular Questions