Proximal Operator / Mapping Intuition and Practical Example

2.2k Views Asked by At

From what I understand, proximal mappings are used in conjunction with iterative schemes for minimisation problems. They are used to speed up convergence to the 'optimal' solution. This is the intuitive explanation I have for it, but I need something with more depth and perhaps an example of this in action.

As for practicality, I am trying to understand how the computation of proximal mappings is performed, for now, I am considering the example $f:\mathbb{R} \rightarrow (-\infty,\infty]$ $$f(x) = \begin{cases} -\log(x), & x>0\\ +\infty, & x\leq 0\end{cases}$$ In the notation of my course notes, I wish to calculate $\text{prox}_{\lambda f}(v)$, where $v \in \mathbb{R}$, $\lambda > 0$.

3

There are 3 best solutions below

4
On BEST ANSWER

The motivation for proximal operators comes from the proximal point algorithm. To minimize a convex and lower semicontinuous $f$ you could update $x^k$ to the next iterate by solving $$ \min_x f(x) + \frac{\|x-x^k\|^2}{2t_k} $$ for some positive sequence $t_k$. The resulting sequence will converge to a minimizer of $f$ as soon as the sum of the stepsizes $t_k$ diverges.

One can also motivate the proximal point method as a majorization-minimization method: At each iteration you choose a maximizing function that is tight at the current point, i.e. at iterate $x^k$ choose some function $f_k$ such that

  1. $f_k(x^k) = f(x^k)$ and
  2. for all $x$ it holds that $f_k(x)\geq f(x)$.

The next iterate is the any minimier of $f_k$. In the case of the proximal point method you choose $f_k(x) = f(x) + \frac{\|x-x^k\|^2}{2t_k}$.

The proximal point method is just an abstract method, since the problems for the iteration are more or less as difficult as the original problem - they are slightly easier than the objective function for the steps are strongly convex instead of merely convex.

The proximal mapping gets really handy for composite problem where you minimize $f+g$ with $g$ convex and differentiable and $f$ convex with "simple" proximal mapping. Then you can use the proximal gradient method, i.e. alternate gradient descent steps for $g$ and proximal steps for $f$ (using the same stepsize) and converge to a minimizer (under some further conditions).

To calculate the proximal map in your example is a simple exercise: just calculate the minimizer of the problem above (in your case using basic calculus).

0
On

Proceeding as Dirk explained, you should find $$\operatorname{Prox}_f(y) = \frac{y+\sqrt{y^2+4\lambda}}{2}. $$ Good references for formulas are the books by Bauschke-Combettes and by Beck.

0
On

The proximal operator (with parameter $t > 0$) of a closed convex function $f$ is defined by $$ \text{prox}_{tf}(\hat x) = \arg \min_x \, f(x) + \frac{1}{2t} \| x - \hat x \|_2^2. $$ When we evalute the prox-operator of $f$, we are attempting to reduce the value of $f$ without straying too far from $\hat x$. You can imagine that this would be a useful sub-step in an optimization algorithm. The parameter $t$ can be viewed as a "step size" that controls how far from $\hat x$ we are allowed to move. When $t$ is very small, the penalty for moving away from $\hat x$ is severe.

One reason that proximal operators are so useful is that for many simple penalty functions $f$ which occur frequently in applications, such as $f(x) = \| x \|_1$, an explicit formula is available for the prox-operator of $f$. So, we often find that relevant prox-operators can be evaluated very efficiently. The prox-operator does not require $f$ to be differentiable, so prox-operators tend to be useful for nonsmooth optimization problems.