Derivative of the Prox / Proximal Operator

1.5k Views Asked by At

Consider a proximal operator,

$$ \operatorname{Prox}_{ \lambda f } \left( \mu x \right) := \arg \min_{u} \lambda f \left( u \right) + \frac{1}{2} {\left\| u - \mu x \right\|}_{2}^{2}.$$

What is the partial derivative of the proximal operator w.r.t. $\lambda$ and $\mu$, i.e.

$$\frac{\partial\operatorname{Prox}_{ \lambda f } \left( \mu x \right)}{\partial\lambda}, \quad \frac{\partial\operatorname{Prox}_{ \lambda f} \left( \mu x \right)}{\partial\mu}?$$

If the general case is not solvable, is it possible to compute the derivative if we restrict $f$ to be an $L_p$ norm?

2

There are 2 best solutions below

2
On

The prox operator takes a point (vector) and maps it into a subset of your vector space, this mapping might be empty, a singleton or a set. Therefore the prox operator is not differentiable.

The following example is from the book by Beck. Consider the following functions: \begin{align} g_1(x) &=0, \\ g_2(x)&=\begin{cases} 0 & x \neq 0\\ - c & x=0, \end{cases}\\ g_3(x)&=\begin{cases} 0 & x \neq 0\\ c & x=0, \end{cases} \end{align} then the prox of the previous functions is:

\begin{align} \text{prox}_{g_1}(x)&=\{x\}.\\ \text{prox}_{g_2}(x)&=\begin{cases} \{0\}, & |x| < \sqrt{2c},\\ \{x\}, & |x| > \sqrt{2c}, \\ \{0,x\}, & |x| = \sqrt{2c}. \end{cases}\\ \text{prox}_{g_3}(x)&=\begin{cases} \{0\} & x \neq 0,\\ \emptyset & x=0. \end{cases} \end{align}

On the other hand, the Moreau envelope, defined as $$M^{\mu}_f(x) = \inf_{y}\bigg\{f(y)+\frac{1}{2\mu} ||x-y||^2 \bigg\},$$ is a smooth map (in fact $\mu$ is called the smoothing parameter), therefore it makes sense to talk about differentiability. The derrivate of the Moreau envelope is $$\nabla M^{\mu}_f(x) = \frac{1}{\mu}(x - \text{prox}_{\mu f}(x)).$$

You can read more on the excellent books by Beck (Ch. 6) and Bauschke & Combettes (Ch. 12).

1
On

For the restricted case where $f$ is differentiable one can derive a solution. First, the derivative w.r.t. to $\lambda$ is

$$\frac{\partial\operatorname{Prox}_{ \lambda f( u ) } \left( x \right)}{\partial\lambda} = \lim_{\epsilon\to 0}\frac{1}{\epsilon}\left[\operatorname{Prox}_{ (\lambda + \epsilon) f( u ) } \left( x \right) - \operatorname{Prox}_{ \lambda f( u ) } \left( x \right)\right]$$

The solution to $\operatorname{Prox}_{ (\lambda + \epsilon) f( u ) } \left( x \right)$ can be computed from a simple Taylor expansion. In particular, any solution has to fulfill

$$(\lambda + \epsilon) \nabla f(u) + (u - \mu x) = 0$$ $$\Leftrightarrow (\lambda + \epsilon) \nabla f(u^{*} + du) + u^{*} + du - \mu x = 0$$

where $u^{*} = \operatorname{Prox}_{ \lambda f( u ) } \left( x \right)$. Then, with $H_f(u^{*})$ being the Hessian of $f$,

$$\Leftrightarrow (\lambda + \epsilon) (\nabla f(u^{*}) + H_f(u^{*}) du) + u^{*} + du - \mu x = 0$$

$$\Leftrightarrow \epsilon \nabla f(u^{*}) + (\lambda + \epsilon) H_f(u^{*}) du + du = 0$$

Hence,

$$du = -\epsilon\left[(\lambda + \epsilon)H_f(u^{*}) + I\right]^{-1}\nabla f(u^{*})$$

$$\Rightarrow \frac{\partial\operatorname{Prox}_{ \lambda f( u ) } \left( x \right)}{\partial\lambda} = -\left[\lambda H_f(u^{*}) + I\right]^{-1}\nabla f(u^{*})$$

In a very similar way we can find

$$\frac{\partial\operatorname{Prox}_{ \lambda f( u ) } \left( x \right)}{\partial\mu} = \left[\lambda H_f(u^{*}) + I\right]^{-1} x$$