Help understanding convergence proof (Momentum algorithm)

35 Views Asked by Bumbble Comm At 09 Apr 2026 - 7:27

I am trying to understand the convergence analysis/derivation of the momentum algorithm, or the stochastic heavy ball algorithm, using the regret bound analysis from different research papers.

https://ieeexplore.ieee.org/document/7330562 - Page3
https://www.mdpi.com/2504-3110/6/12/709 - Page6
http://arxiv.org/abs/1707.01647 - Page4

In the derivation, there is the following simplification, which I do not understand at all

$\frac{2\boldsymbol{\eta}_{k}}{(1-\beta)}\sum_{k=0}^{T}\left|J(\theta_k) - J(\theta^*)\right| + \frac{2\boldsymbol{\eta}_{k}\beta}{(1-\beta)^2} \sum_{k=0}^{T}\left|J(\theta_k) - J(\theta_{k-1})\right| \leq \ \left|\boldsymbol{\theta}_{0} + \boldsymbol{p}_{0} - \boldsymbol{\theta}^* \right|^2 - \left|\boldsymbol{\theta}_{T+1} + \boldsymbol{p}_{T+1} - \boldsymbol{\theta}^*\right|^2 \leq \left| \boldsymbol{\theta}_{0} - \boldsymbol{\theta}^* \right|^2$

The term $\left|\boldsymbol{\theta}_{0} + \boldsymbol{p}_{0} - \boldsymbol{\theta}^* \right|^2 - \left|\boldsymbol{\theta}_{T+1} + \boldsymbol{p}_{T+1} - \boldsymbol{\theta}^*\right|^2$ is simplified to $\leq \left| \boldsymbol{\theta}_{0} - \boldsymbol{\theta}^* \right|^2$ directly. I am sure that there is certainly some assumption or some intermediate steps involved and the reader is supposed to know that. Could you please help me understand.

My understanding: In the context of convex optimization, at any iteration the squared distance between the parameter vector $\theta_k$ and the optimal parameter vector $\theta^*$ never increases as the optimization progresses. This infers that the squared distance from the final parameter vector (plus some perturbation $\boldsymbol{p}$) to the optimal parameter vector is less than or equal to the squared distance from the initial parameter vector (plus some perturbation $\boldsymbol{p}$) to the optimal parameter vector, i.e., $\left| \boldsymbol{\theta}_{T+1} + \boldsymbol{p}_{T+1} - \boldsymbol{\theta}^* \right|^2 \leq \left| \boldsymbol{\theta}_{0} + \boldsymbol{p}_{0} - \boldsymbol{\theta}^*\right|^2$. Also, it is assumed that $\boldsymbol{p_0 = 0}$, so $|\boldsymbol{\theta}_{0} + \boldsymbol{p}_{0} - \boldsymbol{\theta}^*|^2 - |\boldsymbol{\theta}_{T+1} + \boldsymbol{p}_{T+1} - \boldsymbol{\theta}^*|^2 \leq | \boldsymbol{\theta}_{0} - \boldsymbol{\theta}^*|^2$

Original Q&A

Help understanding convergence proof (Momentum algorithm)

Related Questions in OPTIMIZATION

Related Questions in CONVEX-OPTIMIZATION

Related Questions in GRADIENT-DESCENT

Related Questions in DERIVATIONS

Trending Questions

Popular # Hahtags

Popular Questions