Why in proximal gradient descent, "proximal" is referred to as "Backward" and the "gradient" is "Forward"?

352 Views Asked by At

I am not into monotone operators theory (but hope someday I can get a hang of it).

Sorry for asking probably stupid question.

Why in proximal gradient descent, "proximal" is referred to as "Backward" and the "gradient" is referred to as "Forward"?

I can imagine why "gradient" is referred to as "Forward" perhaps because it looks for the "downhill" and move forward. But I am not sure. However, I have no clue why "proximal" is referred to as "Backward". Can someone enlighten me? Thank you.

2

There are 2 best solutions below

1
On BEST ANSWER

Let $t > 0$. To minimize $f+ g$ where $f$ is smooth and $g$ is closed and convex, we need to find a point $x$ that satisfies \begin{align} & 0 \in \nabla f(x) + \partial g(x) \\ \iff & x - t \nabla f(x) \in x + t \partial g(x) \\ \iff & (I + t \partial g)^{-1}(x - t \nabla f(x)) = x. \end{align} The forward-backward method uses the fixed point iteration $$ x^+ = (I + t \partial g)^{-1}(x - t \nabla f(x)). $$ Computing $x - t \nabla f(x)$ is the "forward" step. Applying the operator $(I + t \partial g)^{-1}$ is called the "backward" step, I guess because we are inverting the operator $I + t \partial g$.

If you are able to recognize that $(I + t \partial g)^{-1}$ is the proximal operator of $g$, then you see that the forward-backward method is the same thing as the proximal gradient method.

1
On

The gradient descent (with fixed step size $t > 0$), i.e., $$ x_{k+1} = x_k - t \, \nabla f(x_k)$$ can be rewritten as $$ \frac{x_{k+1} - x_k}{t} = -\nabla f(x_k),$$ i.e., it is a explicit Euler discretization of the gradient flow ODE $$ \dot x(t) = -\nabla f(x(t)). $$ On the other hand, the proximal point method reads $$ x_{k+1} = \operatorname{prox}_{t \, f}(x_k).$$ This implies $$ \frac{x_{k+1} - x_k}{t} \in -\partial f(x_{k+1}).$$ This amounts to an implicit Euler discretization of the gradient flow.

Finally, the explicit Euler method is also called 'forward Euler', whereas implicit Euler is called 'backward Euler'.