How to prove the smoothness assumption of the loss function in the convergence of the Adam optimizer?

28 Views Asked by Bumbble Comm At 26 Mar 2026 - 5:14

I have read 2 papers (this and this) about the convergence of the Adam optimizer. One of the assumptions is the smoothness of the loss function, meaning that the gradient of the loss function is Lipschitz continuous. Let's consider a neural network $f(\theta)$ with a loss function $L$. If I want to prove the smoothness of the loss function, does it imply that I have to derive the calculation about the gradient of the loss function w.r.t every weight (do the backpropagation) and prove that the Lipschitz inequality holds?

Also, does this assumption depends on the network architecture?

Original Q&A

How to prove the smoothness assumption of the loss function in the convergence of the Adam optimizer?

Related Questions in NEURAL-NETWORKS

Trending Questions

Popular # Hahtags

Popular Questions