If $f$ and $g$ are respectively a differentiable function and a convex, lower semi-continuous function, then the algorithm defined by:
$$ x^{k+1} = \text{prox}_{\gamma{g}}[x^{k} - \gamma\nabla{f(x^{k}})]$$
converges to $\text{argmin}[f+g]$.
This is justified by the fact that if $x^{*}$ is a minimizer of $f+g$, then: $$ x^{*} = \text{prox}_{\gamma{g}}[x^{*} - \gamma\nabla{f(x^{*}})]$$
But I do not understand this relation. Why is it true?
That is, why $x^{*} = \text{argmin}[f+g] \Leftrightarrow x^{*} =\text{prox}_{\gamma{g}}[x^{*} - \gamma \nabla{f(x^{*}})] $ ?
I will show below that if $x^* = \text{prox}_{\gamma{g}}[x^{*} - \gamma \nabla{f(x^{*}})]$ then $x^* \in \text{argmin } f(x)+g(x) $.
Plugging in the definition of proximal operator, we have $$\text{prox}_{\gamma{g}}[x^{*} - \gamma \nabla{f(x^{*}})] = \text{argmin}_{x} \left\{\gamma g(x) + \frac{1}{2}\|x- (x^* - \gamma \nabla f(x^*))\|^2\right\}$$ Now since $x^* = \text{prox}_{\gamma{g}}[x^{*} - \gamma \nabla{f(x^{*}})]$, we have by Fermat's rule at $x=x^*$, the following $$0 \in \partial (\gamma g(x)) + (x-(x^*-\gamma\nabla f(x^*)))$$ Now just substitute $x=x^*$, we get $$0\in \gamma \partial g(x^*) + \gamma \nabla f(x^*)$$ This is equivalent to saying that $0 \in \partial F(x^*)$, where $F(x) = g(x)+f(x)$, so $x^*$ is the minimizer. The other direction of the proof is very similar.
Note: The last step $0 \in \partial F(x^*)$ need not always hold (check subdifferential properties).