When is proximal gradient descent better than projected gradient descent?

96 Views Asked by Bumbble Comm At 25 Mar 2026 - 11:08

Suppose one is given two optimization problems ($P_1$ and $P_2$ respectively):

$$ \min_{z\in \mathbb R^n} \left\{\frac{1}{2} \|b - Az\|_2^2 + \iota_{\mathcal{C}}(z)\right\} \\ \min_{z\in \mathbb R^n} \left\{\frac{1}{2} \|b - Az\|_2^2 + f(z)\right\} $$ Above, assume that $A \in \mathbb R^{m \times n}$ is a matrix, $b \in \mathbb R^m$ is a vector of measurements with $2 \leq m \leq n < \infty$. Assume also that $\iota_{\mathcal{C}}$ denotes the indicator function of the bounded, closed convex set $\mathcal{C}\subseteq\mathbb R^n$ (i.e., $\iota_{\mathcal{C}}(z) = 0$ if $z \in \mathcal{C}$ and $\infty$ otherwise); and that $f(\cdot)$ is convex (if it's helpful, assume that $f$ is a gauge, or strongly convex, or whatever you like). Moreover, assume that $f \neq \iota_\mathcal{C}$ so this question isn't trivial.

My question is: Under what conditions is it preferable to run projected gradient descent (for problem $P_1$) rather than run proximal gradient descent (for problem $P_2$)?

Caveats

Note, (for me, at least), it is most interesting/relevant to consider instances where $f$ and $\iota_\mathcal{C}$ are related. For instance, if $\mathcal{C} = \tau\mathbb B_1^n$ is the (scaled) $\ell_1$ ball for some $\tau > 0$, then maybe $f = \lambda \|\cdot\|_1$ is the (scaled) $\ell_1$-norm on $\mathbb R^n$ for some $\lambda > 0$.
By preferable, I mean that a solution $x_1$ of $P_1$ has better error than a solution $x_2$ of $P_2$, in the sense that if $b = Ax + \text{noise}$ for some fixed $x \in \mathbb R^n$ then $\|x_1 - x\|_2 \leq \|x_2 - x\|_2$.
In the example above where $\mathcal{C} = \tau\mathbb B_1^n$ and $f = \lambda \|\cdot\|_1$, note that these two optimization programs (without considering specific implementations of them) are equivalent in the sense that there exists a choice of $\lambda$ such that $P_2$ and $P_1$ have the same solution (at least when the solution is unique). Thus, I am most interested in understanding projected GD vs. prox GD when this equivalence ceases to hold due to the (best) numerical implementation chosen; or when such an equivalence does not hold between the optimization programs to begin with.

Original Q&A

When is proximal gradient descent better than projected gradient descent?

Related Questions in CONVEX-OPTIMIZATION

Related Questions in PROJECTION

Related Questions in GRADIENT-DESCENT

Related Questions in PROXIMAL-OPERATORS

Trending Questions

Popular # Hahtags

Popular Questions