Bounds on successive steps of projected gradient descent

Question

Bounds on successive steps of projected gradient descent

803 Views Asked by Bumbble Comm At 29 Mar 2026 - 2:12

Let $f:\mathbb{R}^n\rightarrow\mathbb{R}$ be a continuously differentiable strongly convex function with a globally $L$-Lipschitz continuous gradient. The normal gradient method for computing the unconstrained minimizer $x^{\star}$ is $$ x_{k+1} = x_k-\alpha \nabla f(x_k). $$ Successive steps satisfy the bound $$ \begin{aligned} \|x_{k+1} - x_k\|_2 &= \alpha \|\nabla f(x_k)\|_2 = \alpha \|\nabla f(x_k)-\nabla f(x^{\star})\|_2\\ &\leq \alpha L\|x_{k}-x^{\star}\|_2. \end{aligned} $$ where I used $\nabla f(x^{\star}) = 0$. This bound has two features (i) it is linear in $\|x_k-x^{\star}\|_2$ and (ii) it is linear in $\alpha$ ... very intuitive of course, if the step size is small, you don't move much.

I am looking for a similar bound for the projected gradient descent method \begin{align}\tag{1}\label{Eq:PGD} x_{k+1} = \mathrm{Proj}_{X}(x_{k}-\alpha \nabla f(x_k)) \end{align} where $X$ is a closed non-empty convex set. I will let $\bar{x} \in X$ denote the constrained optimizer. For me, linearity of the bound in $\|x_{k}-\bar{x}\|_2$ is crucial, and it should have the property that it tends to zero as $\alpha \to 0$ (though I can tolerate lack of linearity there). I am recording my best attempts at this below.

Preliminary Inequalities. Let $\mathcal{N}_{X}(x)$ denote the normal cone of $X$ at the point $x \in X$. Recall that $$ \mathcal{N}_{X}(x) := \{y \in \mathbb{R}^n\,\,:\,\,y^{\sf T}(z-x) \leq 0\quad \forall z \in X\}. $$ It is well known $\mathrm{Proj}_{X}$ is the so-called resolvent operator associated with the normal cone, expressed as $\mathrm{Proj}_{X}(x) = (\mathrm{Id}+\mathcal{N}_{X})^{-1}(x)$. With this, the PGD update law (\ref{Eq:PGD}) and the optimality condition can be equivalently written as $$ \begin{aligned} x_{k} - x_{k+1} - \alpha \nabla f(x_k) &\in \mathcal{N}_{X}(x_{k+1})\\ -\alpha \nabla f(\bar{x}) &\in \mathcal{N}_{X}(\bar{x}) \end{aligned} $$ Expressing these in terms of the definition of the normal cone, the previous inclusions are precisely that $$ \begin{aligned} (x_{k}-x_{k+1}-\alpha \nabla f(x_k))^{\sf T}(z-x_{k+1}) &\leq 0, \quad z \in X\\ -\alpha \nabla f(\bar{x})^{\sf T}(z-\bar{x}) &\leq 0, \quad z \in X \end{aligned} $$ In particular, the first holds at $z = x_k$ and $z = \bar{x}$, which quickly yields the two inequalities \begin{align}\label{Eq:FNE1-a}\tag{2} \|x_{k+1}-x_{k}\|_2^2 &\leq -\alpha (x_{k+1}-x_{k})^{\sf T}\nabla f(x_k)\\ \label{Eq:FNE1-b}\tag{3} (x_{k+1}-x_{k})^{\sf T}(x_{k+1}-\bar{x}) &\leq -\alpha (x_{k+1}-\bar{x})^{\sf T}\nabla f(x_k). \end{align} A simple identity to re-express the LHS of (\ref{Eq:FNE1-b}) is \begin{align}\label{Eq:FNE1-c}\tag{4} (x_{k+1}-x_{k})^{\sf T}(x_{k+1}-\bar{x}) = \tfrac{1}{2}\|x_{k+1}-x_{k}\|_2^2 - \tfrac{1}{2}\left(\|x_{k}-\bar{x}\|_2^2 - \|x_{k+1}-\bar{x}\|_2^2\right) \end{align}

These two inequalities are in fact equivalent to what one can obtain by applying firm non-expansiveness of the projection operator, which yields the alternative (but again, equivalent) expressions \begin{equation}\label{Eq:FNE2} \begin{aligned} \|x_{k+1}-x_{k}\|_2^2 & \leq \alpha^2 \|\nabla f(x_k)\|_2^2 - \|x_{k} - x_{k+1} - \alpha \nabla f(x_k)\|_2^2\\ \|x_{k+1}-\bar{x}\|_2^2 &\leq \|x_{k}-\alpha \nabla f(x_k) - \bar{x}\|_2^2 - \|x_{k} - x_{k+1} - \alpha \nabla f(x_k)\|_2^2. \end{aligned} \end{equation} Similarly, evaluating the optimality conditions at $z = x_k$ and $z = x_{k+1}$ we obtain \begin{align}\label{Eq:Optimality-a}\tag{5} \alpha (x_{k}-\bar{x})^{\sf T}\nabla f(\bar{x}) &\geq 0\\ \label{Eq:Optimality-b}\tag{6} \alpha (x_{k+1}-\bar{x})^{\sf T}\nabla f(\bar{x}) &\geq 0. \end{align}

Attempt #1 -- Simple Upper Bound Using non-expansiveness of the projection and $x_k = \mathrm{Proj}_{X}(x_k)$ one can compute $$ \begin{aligned} \|x_{k+1}-x_{k}\| &= \|\mathrm{Proj}_{X}(x_{k}-\alpha \nabla f(x_k)) - \mathrm{Proj}_{X}(x_k)\|_2\\ &\leq \alpha \|\nabla f(x_k)\|_2\\ &= \alpha \|\nabla f(x_k)-\nabla f(\bar{x}) + \nabla f(\bar{x})\|_2\\ &\leq \alpha L \|x_k-\bar{x}\|_2 + \alpha \|\nabla f(\bar{x})\|_2 \end{aligned} $$

This resembles the above, but with an additional constant term quantifying how active the gradient of $f$ is at the constrained optimizer. The bound however does not force $\|x_{k+1}-x_{k}\|_2$ to go to zero as $x_k \to \bar{x}$, which is what should happen.

Attempt #2(a) Beginning from the preliminary inequality (\ref{Eq:FNE1-a}), we have $$ \begin{aligned} \|x_{k+1}-x_{k}\|_2^2 &\leq -\alpha (x_{k+1}-x_{k})^{T}\nabla f(x_k)\\ &= -\alpha (x_{k+1}-\bar{x}+\bar{x}-x_{k})^{T}\nabla f(x_k)\\ &\leq \alpha (\|x_{k+1}-\bar{x}\|_2 + \|x_{k}-\bar{x}\|_2) \|\nabla f(x_k)-\nabla f(\bar{x})+\nabla f(\bar{x})\|_2\\ \end{aligned} $$ For small enough step sizes the forward-backward step is a contraction operator and the iterates satisfy $$ \|x_{k+1}-\bar{x}\|_2 \leq \sqrt{(1-\alpha \mu)} \|x_k-\bar{x}\|_2 = c_{\alpha}\|x_k-\bar{x}\|. $$ It therefore follows that $$ \begin{aligned} \|x_{k+1}-x_{k}\|_2^2 &\leq \alpha (1+c_{\alpha})\|x_{k}-\bar{x}\|_2 (L\|x_{k}-\bar{x}\|_2 + \|\nabla f(\bar{x})\|_2)\\ &= \alpha L(1+c_{\alpha})\|x_k-\bar{x}\|_2^2 + \alpha (1+c_{\alpha})\|\nabla f(\bar{x})\|_2\|x_{k}-\bar{x}\|_2. \end{aligned} $$ So we conclude that $$ \|x_{k+1}-x_{k}\|_2 \leq \sqrt{2\alpha L\|x_k-\bar{x}\|_2^2 + 2\alpha \|\nabla f(\bar{x})\|_2\|x_{k}-\bar{x}\|_2}, $$ On any ball of size $r_0$ around the point $\bar{x}$, we can therefore obtain a bound of the form $$ \|x_{k+1}-x_{k}\|_2 \leq \sqrt{\alpha\|x_{k}-\bar{x}\|_2}\sqrt{2Lr_0 + 2\|\nabla f(\bar{x})\|_2}, $$ This has the right qualitative behaviour, but it has square-root dependence in $\alpha$ and $\|x_k-\bar{x}\|_2$; for my application of interest, the latter is problematic.

Attempt #2(b) If we instead begin from the other preliminary inequality (\ref{Eq:FNE1-b}) we can compute that $$ \begin{aligned} 2(x_{k+1}-x_{k})^{\sf T}(x_{k+1}-\bar{x}) &\leq - 2\alpha (x_{k+1}-\bar{x})^{\sf T}[\nabla f(x_k)]\\ &= - 2\alpha (x_{k+1} - \bar{x})^{\sf T}[\nabla f(\bar{x}) + \nabla f(x_k) - \nabla f(\bar{x})]\\ &= - 2\alpha(x_{k+1} - \bar{x})^{\sf T}\nabla f(\bar{x}) - 2\alpha (x_{k+1}-x_{k} + x_{k}-\bar{x})^{\sf T}(\nabla f(x_k) - \nabla f(\bar{x}))\\ &= - 2\alpha(x_{k+1} - \bar{x})^{\sf T}\nabla f(\bar{x})\\ &\qquad - 2\alpha (x_{k+1}-x_{k})^{\sf T}(\nabla f(x_k) - \nabla f(\bar{x}))\\ &\qquad - 2\alpha(x_{k}-\bar{x})^{\sf T}(\nabla f(x_k) - \nabla f(\bar{x})) \end{aligned} $$ The first term is non-positive by the optimality condition (\ref{Eq:Optimality-b}). The second term we upper bound using the Lipschitz constant of $f$, and the third term we upper bound using strong convexity of $f$. We quickly obtain $$ 2(x_{k+1}-x_{k})^{\sf T}(x_{k+1}-\bar{x}) \leq -2\alpha m\|x_{k}-\bar{x}\|_2^2 + 2\alpha L \|x_{k+1}-x_{k}\|_2\|x_{k}-\bar{x}\|_2 $$ Using (\ref{Eq:FNE1-c}) to replace the left-hand side, we have $$ \begin{aligned} \|x_{k+1}-x_{k}\|_2^2 &\leq \left(\|x_{k}-\bar{x}\|_2^2 - \|x_{k+1}-\bar{x}\|_2^2\right)\\ &\qquad -2\alpha m\|x_{k}-\bar{x}\|_2^2\\ &\qquad + 2\alpha L \|x_{k+1}-x_{k}\|_2\|x_{k}-\bar{x}\|_2 \end{aligned} $$ Dropping the second term on the RHS and resolving the resulting quadratic, we obtain the bound $$ \|x_{k+1}-x_{k}\|_2 \leq \left(\alpha L + \sqrt{1-2\alpha m + \alpha^2L^2}\right)\|x_{k}-\bar{x}\|_2 $$ which has nice linear behaviour in $\|x_{k}-\bar{x}\|_2$ but does not go to zero as $\alpha \to 0$.

Attempt #3 -- Using Averaged Operators The projection operator is firmly non-expansive, and is therefore $1/2$-averaged. If $\alpha \in (0,2/L)$, the forward step operator $F(x) = (I-\alpha \nabla f)(x)$ is averaged with parameter $\alpha L/2$, since it can be written as $$ F(x) = (I-\tfrac{\alpha L}{2})x + \tfrac{\alpha L}{2}\left(x-\frac{2}{L}\nabla f(x)\right). $$ It follows [Bauschke, Prop 4.32] that the forward-backward operator $T(x) = \mathrm{Proj}_{X}(x-\alpha\nabla f(x))$ is averaged with parameter $$ \gamma = \frac{2}{1+\frac{1}{\max\{\tfrac{1}{2},\tfrac{\alpha L}{2}\}}} = \begin{cases} 2/3 &\text{if} \quad \alpha \in (0,1/L]\\ \frac{2\alpha L}{2+\alpha L} &\text{if} \quad \alpha \in (1/L,2/L) \end{cases} $$ Since $T$ is $\gamma$-averaged, it satisfies the inequality [Bauschke, Prop 4.25(iii)] $$ \|T(x)-T(y)\|_2^2 \leq \|x-y\|_2^2 - \frac{1-\gamma}{\gamma}\|x - T(x) - y + T(y)\|_2^2. $$ Set $x = x_{k}$, so that $T(x) = x_{k+1}$, and set $y = \bar{x}$, so that $T(y) = \bar{x}$. Then $$ \|x_{k+1}-\bar{x}\|_2^2 \leq \|x_{k}-\bar{x}\|_2^2 - \epsilon\|x_{k+1}-x_{k}\|_2^2 $$ where $$ \epsilon = \frac{1-\gamma}{\gamma} = \begin{cases} 1/2 &\text{if} \quad \alpha \in (0,1/L)\\ \frac{2-\alpha L}{\alpha L} &\text{if} \quad \alpha \in (1/L,2/L). \end{cases} $$ For $\alpha \in (0,1/L]$ we therefore obtain $$ \begin{aligned} \tfrac{1}{2}\|x_{k+1}-x_{k}\|_2^2 &\leq \|x_{k}-\bar{x}\|_2^2 - \|x_{k+1}-\bar{x}\|_2^2\\ \end{aligned} $$ Bounding away the second term, we get $$ \|x_{k+1}-x_{k}\|_2 \leq \sqrt{2}\|x_{k}-\bar{x}\|_2 $$ which slightly improves on Zim's bound, but does not have the right qualitative behaviour as $\alpha \to 0$. Intuitively, the quantity $\|x_{k}-\bar{x}\|_2^2 - \|x_{k+1}-\bar{x}\|_2^2$ should be proportional to $\alpha^2\|x_k-\bar{x}\|_2^2$, but convergence gives me a lower bound on this quantity, not an upper bound.

Simple Lower Bound Not directly what I'm looking for, but interesting and easy. Applying the reverse triangle inequality, we have $$ \begin{aligned} \|x_{k+1}-x_{k}\|_2 &= \|x_{k+1}-\bar{x} - (x_{k}-\bar{x})\|_2\\ &\geq \|x_{k}-\bar{x}\|_2 - \|x_{k+1}-\bar{x}\|_2\\ &\geq (1-\sqrt{1-\alpha m})\|x_{k}-\bar{x}\|_2 \end{aligned} $$ which for small $\alpha$ further implies that $$ \|x_{k+1}-x_{k}\|_2 \geq \alpha\frac{m}{2}\|x_{k}-\bar{x}\|_2 $$

Original Q&A

There are 2 best solutions below

Bumbble Comm On 10 Jun 2020 - 2:22

Let $x^*$ be a solution to the problem $\textrm{minimize}_{x\in X}f(x)$, and suppose $\alpha\in]0,1/L[$. Since $x^*$ is a solution, it is a fixed point of the forward-backward operator: $$\textrm{Proj}_X(x^*-\alpha \nabla f x^*)=x^* \Leftrightarrow x^*-\textrm{Proj}_X(x^*-\alpha \nabla f x^*)=0.$$

By the triangle inequality and nonexpansiveness of $\textrm{Proj}_X$, \begin{align} \|x_{k+1}-x_k\| &= \|\textrm{Proj}_X(x_{k}-\alpha\nabla f(x_k))-x_k\|\\ &=\|x_k - \textrm{Proj}_X(x_{k}-\alpha\nabla fx_k)-\left(x^*-\textrm{Proj}_X(x^*-\alpha \nabla f(x^*))\right)\|\\ &\leq\|x_k - x^*\| + \|\textrm{Proj}_X(x_{k}-\alpha\nabla f(x_k))-\textrm{Proj}_X(x^*-\alpha \nabla f(x^*))\|\\ &\leq \|x_k - x^*\| + \|x_{k}-\alpha\nabla f(x_k)-(x^*-\alpha \nabla f(x^*))\|.\tag{*}\\ \end{align} Since $f$ is convex and differentiable, the Baillon-Haddad theorem tells us that $\alpha \nabla f$ is firmly nonexpansive, hence $\textrm{Id}-\alpha\nabla f$ is also firmly nonexpansive, hence nonexpansive. It follows from (*) that $$\|x_{k+1}-x_k\|\leq 2\|x_k-x^*\|$$ $\square$. A few thoughts:

The projected gradient (and proximal gradient) operator is a special case of the forward-backward operator, which typically only requires $\alpha\in]0,2/L[$ (and $f$ to be convex and $X$ to be closed and convex) for this iterative method to converge. The result above has a stricter range for $\alpha$, but it could perhaps be improved.
~~You probably do not need to impose strong convexity of $f$ for this result to hold.~~ EDIT: Claim revoked, now that I have a better understanding of the type of inequality you're searching for (both a coefficient of $\alpha$ and the distance $\|x_k-x^*\|$ appearing in the upper bound)

**Bumbble Comm** · Accepted Answer

Unfortunately, it is not possible. What you are looking for is a function $g(\alpha,\mu, L)$ that is dependent only on the Lipschitz constant $L$, the strong convexity constant $\mu$, and the set size $\alpha$, such that $\lim_{\alpha \to 0}g(\alpha, \mu, L)=0$ and \begin{align} ||x_{k+1}-x_{k}|| \leq g(\alpha,\mu,L) ||x_{k}-x^{*}|| \end{align} for any convex set X.

The issue is that the function $g$ has no ''information'' about the set $X$ you are projecting onto, it is independent of the set $X$. Therefore, by choosing the set $X(\alpha)$ as a function of $\alpha$ we can force $g$ to not have the property $\lim_{\alpha \to 0}g(\alpha, \mu, L)=0$.

Consider the following simple example $f(x) = \frac{1}{2}x^{2}$, so $\nabla f(x) = x$. Then $x_{k+1} = P_{X(\alpha)}((1 - \alpha) x_{k})$ and let $P_{X(\alpha)} = \{x\ |\ x\geq 1-\alpha,\ x\in\mathbb{R}\}$. Notice, that the minimizer for the set $X(\alpha)$ is $x^{*} = (1-\alpha)$.

Now let us set $x_0 = 1$, then we see that the condition $||x_{k+1}-x_{k}|| \leq g(\alpha,\mu,L) ||x_{k}-x^{*}||$ at the first iteration is,

\begin{align} ||x_{1}-x_{0}|| &\leq g(\alpha,\mu,L) ||x_{0}-x^{*}|| \\ ||P_{X(a)}((1 - \alpha)1) - 1|| &\leq g(\alpha,\mu,L) ||1-(1-\alpha)|| \\ ||(1 - \alpha) - 1|| &\leq g(\alpha,\mu,L) ||1-(1-\alpha)|| \\ \alpha &\leq g(\alpha,\mu,L) \alpha \\ 1 &\leq g(\alpha,\mu,L) \end{align} This example shows that for any $\alpha \in (0,1)$ you can construct a set $X$ and an initial condition $x_0$ such that $g(\alpha,\mu,L) \geq 1$. Therefore, there is no function $g$ such that $\lim_{\alpha \to 0}g(\alpha, \mu, L)=0$ and \begin{align} ||x_{k+1}-x_{k}|| \leq g(\alpha,\mu,L) ||x_{k}-x^{*}|| \end{align} for any convex set X.

Bounds on successive steps of projected gradient descent

There are 2 best solutions below

Related Questions in OPTIMIZATION

Related Questions in CONVEX-OPTIMIZATION

Related Questions in MACHINE-LEARNING

Related Questions in NUMERICAL-OPTIMIZATION

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions