Adam Optimising Algorithm

66 Views Asked by Bumbble Comm At 07 Apr 2026 - 7:44

In the original paper introducing Adams Optimising Algorithm (link: https://arxiv.org/pdf/1412.6980.pdf) the authors when proving the convergence analysis of the algorithm specifically states that the theorems proving the algorithm holds only when the learning rate decays at the rate of $t^{-1/2}$ but I am not able to figure out the exact reason why the theorems won't hold if the learning rate decays at the rate of say $t^{-1/4}$. What exactly will go wrong? Please help me figure out the logic behind the statement.

Original Q&A

There are 1 best solutions below

Bumbble Comm On 05 Aug 2020 - 12:12

If you are referring to Theorem 4.1, it says that both $\alpha_t \leq \frac{\alpha}{\sqrt{t}}$ and $\beta_{1,t} = \beta_1 \lambda^{t-1}$ for some $0<\lambda<1$ are necessary for the upper regret bound to hold. Under these conditions + assumptions on bounded gradients, the regret bound is $O(\sqrt{T})$ (note the large $T$). If you increase $O(\frac{1}{\sqrt{t}})$ to $O(t^{-\frac{1}{4}})$, i.e. make the learning rate converge faster, this bound may not hold because $$ \frac{\alpha_t}{t^{-\frac{1}{4}}} = \frac{\alpha_t}{\frac{1}{\sqrt{t}}}\cdot\frac{\frac{1}{\sqrt{t}}}{t^{-\frac{1}{4}}} \to_t c \cdot 0 \Rightarrow\alpha_t = o\bigg(\frac{1}{t^{\frac{1}{4}}}\bigg) $$

Adam Optimising Algorithm

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in CONVEX-ANALYSIS

Related Questions in CONVEX-OPTIMIZATION

Trending Questions

Popular # Hahtags

Popular Questions