$\newcommand{\prox}{\operatorname{prox}}$ Probably the most remarkable property of the proximal operator is the fixed point property:
The point $x^*$ minimizes $f$ if and only if $x^* = \prox_f(x^*) $
So, indeed, $f$ can be minimized by find a fixed point of its proximal operator. See Proximal Algorithms by Neal Parikh and Stephen Boyd.
Question 1. In the paper given above, author is saying:
If $\prox_f$ were a contraction, i.e., Lipschitz continuous with constant less than $1$, repeatedly applying $\prox_f$ would find a (here, unique) fixed point
Why the bound on the first-order derivative guarantees finding a fixed point by repeatedly applying proximal operator?
Question 2. Let me quote a paragraph from the same paper:
It turns out that while $\prox_f$ need not be a contraction (unless $f$ is strongly convex), it does have a different property, firm nonexpansiveness, sufficient for fixed point iteration:
$\|\prox_f(x) - \prox_f(y)\|^2_2 \le (x-y)^T (\prox_f(x) - \prox_f(y))$
$\forall x,y \in \mathbb{R}^n$
Firmly nonexpansive operators are special cases of nonexpansive operators (those that are Lipschitz continuous with constant 1). Iteration of a general nonexpansive operator need not converge to a fixed point: consider operators like $-I$ or rotations. However, it tunrs out that if $N$ is nonexpansive, then the operator $T = (1-\alpha)I + \alpha N$, where $\alpha \in (0,1)$ has the same fixed points as $N$ and simple iteration of $T$ will converge to a fixed point of $T$ (and thus of $N$), i.e. the sequence:
$x^{k+1} := (1-\alpha)x^k +\alpha N(x^k)$
will converge to a fixed point of $N$. Put differently, damped iteration of a nonexpansive operator will converge to one of its fixed points.
Operators in the form $(1-\alpha)I + \alpha N$, where $N$ is non-expansive and $\alpha \in (0,1)$, are called $\alpha$-averaged operators.
Firmly nonexpansive operators are averaged: indeed, they are precisely the (1/2)-averaged operators.
Why "unless $f$ is strongly convex"?
What is the intuition behind the given expression for firm nonexpansiveness?
How can you show that firm nonexpansive operators are $\alpha$-averaged with $\alpha = \frac{1}{2}$?
Is anyone aware of the proof of why proximal map is firm nonexpansive?
You can proof the non-expansiveness of the proximal operator as following: $\DeclareMathOperator{\prox}{prox}$
Note that the proof is similar to the proof of the (firm) non-expansiveness of the projection operator [1].
You can imagine the the proximal operator as some kind of generalised projection operator.
Proof: We pick two arbitrary points $z_1$, $z_2$.
\begin{align} (z_1 - \prox_f(z_1))^\top (x- \prox_f(z_1)) &\le 0 \quad \forall x \in \mathcal C \\ \end{align}
By definition of the proximal operator $\prox_f(z_2)$ is in $\mathcal C$.
The optimality condition for $\prox_f(z_1)$ is: \begin{align} (f_{\prox_f(z_1)} + \prox_f(z_1) - z_1)^\top (\prox_f (z_2) - \prox_f (z_1)) &\ge 0 \label{eq:opt_cond_1}\\ \end{align}
and for $\prox_f(z_2)$: \begin{align} (f_{\prox_f(z_2)} + \prox_f(z_2) - z_2)^\top (\prox_f (z_1) - \prox_f (z_2)) &\ge 0 \\ (z_2 - f_{\prox_f(z_2)} - \prox_f(z_2) - )^\top (\prox_f (z_2) - \prox_f (z_1)) &\ge 0 \label{eq:opt_cond_2} \end{align}
The subgradient properties for $f_{\prox_f(z_1)}$ and $f_{\prox_f (z_2)}$ are:
\begin{align} f(\prox_f(z_2)) - f(\prox_f(z_1)) \ge f_{\prox_f(z_2)}^\top (\prox_f(z_2) - \prox_f(z_1)) \label{eq:subgrad_1}\\ f(\prox_f(z_1)) - f(\prox_f(z_2)) \ge f_{\prox_f(z_1)}^\top (\prox_f(z_1) - \prox_f(z_2)) \label{eq:subgrad_2}\\ \end{align} Adding the two subgradient properties results to: \begin{align} 0 \ge (f_{\prox_f(z_2)} - f_{\prox_f(z_1)})^\top (\prox_f(z_2) - \prox_f(z_1)) \label{eq:subgrad_zero} \end{align}
Adding the two optimality conditions leads us to: \begin{align} ((z_2 - z_1) + (f_{\prox_f(z_1)} - f_{\prox_f(z_2)}) + (\prox_f(z_1) - \prox_f(z_2)))^\top (\prox_f(z_2) - \prox_f(z_1)) &\ge 0 \\ (- (z_2 - z_1) + (f_{\prox_f(z_2)} - f_{\prox_f(z_1)}) + (\prox_f(z_2) - \prox_f(z_1)))^\top (\prox_f(z_2) - \prox_f(z_1)) &\le 0 \end{align}
We use that the subgradient properties to bound the subgradients by zero: \begin{align} (\prox_f(z_2) - \prox_f(z_1))^\top (\prox_f(z_2) - \prox_f(z_1)) &\le ((z_2 - z_1) - (f_{\prox_f(z_2)} - f_{\prox_f(z_1)}))^\top (\prox_f(z_2) - \prox_f(z_1)) \\ (\prox_f(z_2) - \prox_f(z_1))^\top (\prox_f(z_2) - \prox_f(z_1)) &\le (z_2 - z_1)^\top (\prox_f(z_2) - \prox_f(z_1)) \end{align} And use Cauchy-Schwarz: \begin{align} ||\prox_f(z_2) - \prox_f(z_1)||^2 &\le ||z_2 - z_1|| ||\prox_f(z_2) - \prox_f(z_1)|| \\ ||\prox_f(z_2) - \prox_f(z_1)|| &\le ||z_2 - z_1|| \\ &&\square \end{align}