Correct cases for proximal operator of $L_2$ / Euclidean norm when solving without Moreau's decomposition

152 Views Asked by At

For the preparation of my lecture, I'm trying to derive the proximal operator for the $L_2$-norm, i.e., the solution of: $$ \hat{\mathbf{x}}(\mathbf{y}) = \underset{\mathbf{x}}{\text{arg min}}\quad \underbrace{\frac{1}{2} \Vert \mathbf{x} - \mathbf{y} \Vert^2_2 + \lambda \Vert \mathbf{x} \Vert_2}_{:=f(\mathbf{x})} \,. $$ I'm trying to derive the popular answer $$ \hat{\mathbf{x}}(\mathbf{y}) = \max \lbrace 0, \left( 1 - \frac{\lambda}{\Vert \mathbf{y} \Vert_2} \right) \rbrace \cdot \mathbf{y} $$ without using Moreau's decomposition.

For the case $\mathbf{x} \neq \mathbf{0}$, the gradient $\nabla f(\mathbf{x})$ can be calculated as: $$ \nabla f(\mathbf{x}) = \mathbf{x} - \mathbf{y} + \lambda \frac{\mathbf{x}}{\Vert\mathbf{x}\Vert_2} . $$ (I will continue on this later).

For $\mathbf{x} = \mathbf{0}$, I will use subdifferentials, denoted by $\partial$. It holds: \begin{align} \partial f(\mathbf{x}) = \partial \Vert \mathbf{x} - \mathbf{y} \Vert^2_2 + \partial \lambda \Vert \mathbf{x} \Vert_2 = \mathbf{x} - \mathbf{y} + \partial \lambda \Vert \mathbf{x} \Vert_2 , \end{align} with $$ \partial \lambda \Vert \mathbf{x} \Vert_2 = \lbrace \mathbf{g}: \Vert \mathbf{g} \Vert_2 \leq \lambda \rbrace , $$ and after setting $\mathbf{x} = \mathbf{0}$: $$ \partial f(\mathbf{\mathbf{0}}) = -\mathbf{y} + \partial \lambda \Vert \mathbf{x} \Vert_2 . $$ $$ \mathbf{y} = \partial \lambda \Vert \mathbf{x} \Vert_2 .$$ Since $\mathbf{0} \in \partial \lambda \Vert \mathbf{x} \Vert_2$, we can return $\mathbf{0}$ in this case.

Now back to the case of $\mathbf{x} \neq \mathbf{0}$. We will set the gradient to $\mathbf{0}$ and solve for $\mathbf{x}$:

$$ \mathbf{x} - \mathbf{y} + \lambda \frac{\mathbf{x}}{\Vert \mathbf{x} \Vert_2} \overset{!}{=} \mathbf{0} $$

$$ \mathbf{x} \left( 1 + \frac{\lambda}{\Vert \mathbf{x} \Vert_2} \right) = \mathbf{y}$$ now apply $\Vert \cdot \Vert_2$: $$ \Vert \mathbf{x} \Vert_2 \left( 1 + \frac{\lambda}{\Vert \mathbf{x} \Vert_2} \right) = \Vert \mathbf{y} \Vert_2 $$ and solve for $\Vert \mathbf{x} \Vert_2$: $$ \Vert \mathbf{x} \Vert_2 = \Vert \mathbf{y} \Vert_2 - \lambda $$ Now use this to replace $\Vert \mathbf{x} \Vert_2$ in the second step of this derivation to obtain: $$ \mathbf{x} \left( 1 + \frac{\lambda}{\Vert \mathbf{y} \Vert_2 - \lambda} \right) = \mathbf{y} $$ and solve for $\mathbf{x}$: $$ \mathbf{x} = \left( 1 - \frac{\lambda}{\Vert \mathbf{y} \Vert_2} \right) \cdot \mathbf{y} . $$

Now, in order to make sure that $\mathbf{x} \neq \mathbf{0}$, it has to hold $$ \left( 1 - \frac{\lambda}{\Vert \mathbf{y} \Vert_2} \right) \cdot \mathbf{y} \neq \mathbf{0}, $$ i.e., $$ \Vert \mathbf{y} \Vert_2 \neq \lambda . $$

In total, this would result in the solution $$ \hat{\mathbf{x}} (\mathbf{y}) = \left( 1 - \frac{\lambda}{\Vert \mathbf{y} \Vert_2} \right) \cdot \mathbf{y} \quad \text{for } \Vert \mathbf{y} \Vert_2 \neq \lambda $$ and $$ \hat{\mathbf{x}} (\mathbf{y}) = \mathbf{0} \quad \text{for } \Vert \mathbf{y} \Vert_2 = \lambda.$$ Apparently, this differs from the common solution stated at the beginning of my question. Many thanks in advance for any helping input!

1

There are 1 best solutions below

0
On

You may be interested in the website http://proximity-operator.net that lists many different proximity operators (+ code).

For instance, the one of $\|.\|_2$ is the first on this page http://proximity-operator.net/multivariatefunctions.html

Moreover, as mentioned there, the proof of the prox formula is given in Examples 2.15 and 2.16 (p. 8-9) of the paper

Combettes, P. L., & Wajs, V. R. (2005). Signal recovery by proximal forward-backward splitting. Multiscale modeling & simulation, 4(4), 1168-1200. https://www.ljll.math.upmc.fr/~plc/mms1.pdf

To prove it, they use the fact that $\|x\|$ is the distance $d(x,C)$ between $x$ and the convex set $C$ when $C$ is the singleton $\{0\}$, with ${\rm prox}_d(.,C)$ proved in Ex 2.15.

I guess it could be possible to simplify this proof for students.

However, Moreau's identity is useful to teach. It's a generalization of the decomposition of vectors over a component along a subspace and another along its orthogonal complement -- a fact they generally used and proved in matrix algebra courses. Just pick the convex function of a prox as the indicator of the subspace, and its conjugate will be the indicator on the orthogonal complement and the rest follows.

Hope this helps.