I am not into monotone operators theory (but hope someday I can get a hang of it).
Sorry for asking probably stupid question.
Why in proximal gradient descent, "proximal" is referred to as "Backward" and the "gradient" is referred to as "Forward"?
I can imagine why "gradient" is referred to as "Forward" perhaps because it looks for the "downhill" and move forward. But I am not sure. However, I have no clue why "proximal" is referred to as "Backward". Can someone enlighten me? Thank you.
Let $t > 0$. To minimize $f+ g$ where $f$ is smooth and $g$ is closed and convex, we need to find a point $x$ that satisfies \begin{align} & 0 \in \nabla f(x) + \partial g(x) \\ \iff & x - t \nabla f(x) \in x + t \partial g(x) \\ \iff & (I + t \partial g)^{-1}(x - t \nabla f(x)) = x. \end{align} The forward-backward method uses the fixed point iteration $$ x^+ = (I + t \partial g)^{-1}(x - t \nabla f(x)). $$ Computing $x - t \nabla f(x)$ is the "forward" step. Applying the operator $(I + t \partial g)^{-1}$ is called the "backward" step, I guess because we are inverting the operator $I + t \partial g$.
If you are able to recognize that $(I + t \partial g)^{-1}$ is the proximal operator of $g$, then you see that the forward-backward method is the same thing as the proximal gradient method.