Proximal Gradient Method Justification - Why Does It Work?

441 Views Asked by Bumbble Comm At 09 Apr 2026 - 9:27

If $f$ and $g$ are respectively a differentiable function and a convex, lower semi-continuous function, then the algorithm defined by:

$$ x^{k+1} = \text{prox}_{\gamma{g}}[x^{k} - \gamma\nabla{f(x^{k}})]$$

converges to $\text{argmin}[f+g]$.

This is justified by the fact that if $x^{*}$ is a minimizer of $f+g$, then: $$ x^{*} = \text{prox}_{\gamma{g}}[x^{*} - \gamma\nabla{f(x^{*}})]$$

But I do not understand this relation. Why is it true?

That is, why $x^{*} = \text{argmin}[f+g] \Leftrightarrow x^{*} =\text{prox}_{\gamma{g}}[x^{*} - \gamma \nabla{f(x^{*}})] $ ?

Original Q&A

There are 2 best solutions below

Bumbble Comm On 17 Jan 2019 - 4:03 BEST ANSWER

I will show below that if $x^* = \text{prox}_{\gamma{g}}[x^{*} - \gamma \nabla{f(x^{*}})]$ then $x^* \in \text{argmin } f(x)+g(x) $.

Plugging in the definition of proximal operator, we have $$\text{prox}_{\gamma{g}}[x^{*} - \gamma \nabla{f(x^{*}})] = \text{argmin}_{x} \left\{\gamma g(x) + \frac{1}{2}\|x- (x^* - \gamma \nabla f(x^*))\|^2\right\}$$ Now since $x^* = \text{prox}_{\gamma{g}}[x^{*} - \gamma \nabla{f(x^{*}})]$, we have by Fermat's rule at $x=x^*$, the following $$0 \in \partial (\gamma g(x)) + (x-(x^*-\gamma\nabla f(x^*)))$$ Now just substitute $x=x^*$, we get $$0\in \gamma \partial g(x^*) + \gamma \nabla f(x^*)$$ This is equivalent to saying that $0 \in \partial F(x^*)$, where $F(x) = g(x)+f(x)$, so $x^*$ is the minimizer. The other direction of the proof is very similar.

Note: The last step $0 \in \partial F(x^*)$ need not always hold (check subdifferential properties).

Bumbble Comm On 17 Jan 2019 - 5:31

Here's an explanation which assumes that we already understand the idea that the prox-operator of $g$ with parameter $t > 0$ is the operator $(I + t \partial g)^{-1}$, where $\partial g$ is the subdifferential of $g$.

I'll assume that $f$ is convex as well as differentiable, and that $g$ is convex and closed. Let $t >0$. A point $x$ is a minimizer of $f + g$ if and only if \begin{align} &0 \in \nabla f(x) + \partial g(x) \\ \iff &x \in x + t \nabla f(x) + t\partial g(x) \\ \iff &x - t \nabla f(x) \in (I + t \partial g)(x) \\ \iff &x = (I + t \partial g)^{-1}(x - t \nabla f(x)). \end{align} The final equation is another way of saying that $$ x = \text{prox}_{tg}(x - t \nabla f(x)). $$

We can then solve this equation using fixed point iteration, which yields the proximal gradient method.

By the way, if this derivation of the proximal gradient method doesn't seem intuitive, there are other ways to discover the proximal gradient method that are more obvious. The viewpoint given here has the advantage that it shows that the proximal gradient method is a fixed point iteration, which helps us with convergence proofs.

Proximal Gradient Method Justification - Why Does It Work?

There are 2 best solutions below

Related Questions in OPTIMIZATION

Related Questions in CONVEX-ANALYSIS

Related Questions in GRADIENT-DESCENT

Related Questions in PROXIMAL-OPERATORS

Related Questions in SEMICONTINUOUS-FUNCTIONS

Trending Questions

Popular # Hahtags

Popular Questions