The Douglas-Rachford optimization algorithm solves problems of the form $$\text{minimize} \hspace{8pt} f(x) + g(x)$$ where $f$ and $g$ are Closed Convex Proper (CCP). It is useful when both $f$ and $g$ have simple proximal operators (in the sense that they can be computed easily).
I naively expected the optimization algorithm to be as follows: select $y^{(0)}$ and a step size $t$, and then iterate \begin{align} x^{(k)} &= \text{prox}_{tf}\left( y^{(k-1)} \right) \\ y^{(k)} &= \text{prox}_{tg}\left( x^{(k)} \right) \end{align} for $N$ iterations or until some reasonable stopping criteria is reached.
However, the D-R algorithm is somewhat more difficult (and, at the moment, less intuitive). It is: \begin{align} x^{(k)} &= \text{prox}\left( y^{(k-1)} \right) \\ y^{(k)} &= y^{(k-1)} + \text{prox}_g\left( 2x^{(k)} - y^{(k-1)}\right) - x^{(k)}. \end{align}
The first step of the iteration is what I expected, but the second step surprised me.
Is the first "simple" optimization bad for some reason? Does it converge but at a slower rate than DR? Or does it not converge for many cases? Is there a simple way to relate the simple algorithm to the DR algorithm (perhaps by adding momentum into the optimization algorithm)?
Thank you for your help!