Why in proximal gradient descent, "proximal" is referred to as "Backward" and the "gradient" is "Forward"?

352 Views Asked by Bumbble Comm At 31 Mar 2026 - 5:19

I am not into monotone operators theory (but hope someday I can get a hang of it).

Sorry for asking probably stupid question.

Why in proximal gradient descent, "proximal" is referred to as "Backward" and the "gradient" is referred to as "Forward"?

I can imagine why "gradient" is referred to as "Forward" perhaps because it looks for the "downhill" and move forward. But I am not sure. However, I have no clue why "proximal" is referred to as "Backward". Can someone enlighten me? Thank you.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 30 Oct 2020 - 8:18 BEST ANSWER

Let $t > 0$. To minimize $f+ g$ where $f$ is smooth and $g$ is closed and convex, we need to find a point $x$ that satisfies \begin{align} & 0 \in \nabla f(x) + \partial g(x) \\ \iff & x - t \nabla f(x) \in x + t \partial g(x) \\ \iff & (I + t \partial g)^{-1}(x - t \nabla f(x)) = x. \end{align} The forward-backward method uses the fixed point iteration $$ x^+ = (I + t \partial g)^{-1}(x - t \nabla f(x)). $$ Computing $x - t \nabla f(x)$ is the "forward" step. Applying the operator $(I + t \partial g)^{-1}$ is called the "backward" step, I guess because we are inverting the operator $I + t \partial g$.

If you are able to recognize that $(I + t \partial g)^{-1}$ is the proximal operator of $g$, then you see that the forward-backward method is the same thing as the proximal gradient method.

user13838 On 30 Oct 2020 - 3:10

The gradient descent (with fixed step size $t > 0$), i.e., $$ x_{k+1} = x_k - t \, \nabla f(x_k)$$ can be rewritten as $$ \frac{x_{k+1} - x_k}{t} = -\nabla f(x_k),$$ i.e., it is a explicit Euler discretization of the gradient flow ODE $$ \dot x(t) = -\nabla f(x(t)). $$ On the other hand, the proximal point method reads $$ x_{k+1} = \operatorname{prox}_{t \, f}(x_k).$$ This implies $$ \frac{x_{k+1} - x_k}{t} \in -\partial f(x_{k+1}).$$ This amounts to an implicit Euler discretization of the gradient flow.

Finally, the explicit Euler method is also called 'forward Euler', whereas implicit Euler is called 'backward Euler'.

Why in proximal gradient descent, "proximal" is referred to as "Backward" and the "gradient" is "Forward"?

There are 2 best solutions below

Related Questions in REAL-ANALYSIS

Related Questions in OPERATOR-THEORY

Related Questions in CONVEX-ANALYSIS

Related Questions in NUMERICAL-OPTIMIZATION

Related Questions in GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions