The gradient descent flow has the property that the loss is a monotone decreasing function. My training loss increases and decreases along the trajectory. Why does this happen?
2026-03-28 04:34:31.1774672471
If the gradient descent flow is a monotone decreasing function, why isn't my training loss monotonic?
480 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in MONOTONE-FUNCTIONS
- Monotonicity of a differentiable positive function
- Convexity, Monotonicity, Positivity
- Monotonicity of function $f(x)=\sqrt[3]{(x+1)^2}-\sqrt[3]{x^2}$
- Sufficient/necessary condition for submatrix determinant (minor) that decreases with size?
- Composition of a non-increasing and a non-decreasing function
- Choosing right options based on given condition of differentiabile function
- Nowhere Monotonic/ Differentiable function proof
- Lebesgue's monotone convergence theorem, - boundedness
- Power of a decreasing sequence of positive reals.
- Does a monotone function exist such that there is a "simple" closed form for itself as well as its inverse?
Related Questions in GRADIENT-DESCENT
- Gradient of Cost Function To Find Matrix Factorization
- Can someone explain the calculus within this gradient descent function?
- Established results on the convergence rate of iterates for Accelerated Gradient Descent?
- Sensitivity (gradient) of function solved using RK4
- Concerning the sequence of gradients in Nesterov's Accelerated Descent
- Gradient descent proof: justify $\left(\dfrac{\kappa - 1}{\kappa + 1}\right)^2 \leq \exp(-\dfrac{4t}{\kappa+1})$
- If the gradient of the logistic loss is never zero, does that mean the minimum can never be achieved?
- How does one show that the likelihood solution for logistic regression has a magnitude of infinity for separable data (Bishop exercise 4.14)?
- How to determinate that a constrained inequality system is not empty?
- How to show that the gradient descent for unconstrained optimization can be represented as the argmin of a quadratic?
Related Questions in NEURAL-NETWORKS
- Retrain of a neural network
- Angular values for input to a neural network
- Smooth, differentiable loss function 'bounding' $[0,1]$
- How to show that a gradient is a sum of gradients?
- Approximation rates of Neural Networks
- How does using chain rule in backprogation algorithm works?
- Computing the derivative of a matrix-vector dot product
- Need to do an opposite operation to a dot product with non square matrices, cannot figure out how.
- Paradox of square error function and derivates in neural networks
- Momentum in gradient descent
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?

Gradient flow is gradient descent with "infinitesimal step sizes." When you actually perform gradient descent you of course use steps which are some non-infinitesimal size, and these steps can "overshoot" what gradient flow would theoretically do and end up increasing the loss function. You can try to compensate for this by making your step sizes smaller but this comes at the cost of training taking longer.