Proximal operator of squared $\ell_1$-norm

100 Views Asked by At

For any $a \in \mathbb R^d$ and $t \ge 0$, let $p_t(a)$ be the unique minimizer of $f_t(x;a) := \|x-a\|_2^2 + t\|x\|_1^2$ over $x \in \mathbb R^d$.

Question. Is there an analytic formula for $p_t(a)$ ?


Observation 1. By the subdifferential characterization of prox-operators, we know that $p_t(a) = (1+t\partial f)^{-1}(a)$, where $\partial f$ is the subdifferential operator of $f$. Thus, if $R := \|p_t(a)\|_1$, then $p_t(a)$ satisfies $p_t(a) = (1+Rt\partial \|\cdot\|_1)^{-1}a = \mbox{prox}_{Rt\|\cdot\|_1}(a)$, i.e the components of $p_t(a)$ are given by

$$ (p_t(a))_j = \mathrm{ST}(a_j;Rt), \tag{1} \label{1} $$

Here, $\mathrm{ST}(u;\lambda)$ is the well-known soft-thresholding operator defined by

$$ \mathrm{ST}(u;\lambda) := \mbox{sign}(u)(|u| - \lambda)_+. $$

However, the issue is that $R$ is unknown, since it depends on the sought-for $p_t(a)$.

Observation 2. From \eqref{1}, we deduce that: if $Rt \ge \|a\|_\infty := \max_j |a_j|$, then $p_t(a) = 0$. Thus, we may assume $Rt \le \|a\|_\infty$. We may therefore search for $p_t(a)$ in form \eqref{1} with $Rt$ restricted to the compact interval $[0,\|a\|_\infty]$. This is one-dimensional problem! Unfortunately, it's still not a closed-form solution...

1

There are 1 best solutions below

0
On

The lemma mentioned in the comments is actually taken from this article by Evgeniou et al. In Appendix B, after introducing a variational representation of $\lVert x\rVert_1^2$, they show that $$ [p_t(a)]_i = \frac{\lambda_i a_i}{t+\lambda_i} $$ where $$ \lambda_i=(\rho \lvert a_i \rvert - t)_+ $$ and where $\rho$ must be set to satisfy $\sum_i \lambda_i = 1$.

A different, but equivalent, solution can be worked out as follows. We notice that evaluating the proximal operator of the squared $\ell_1$ norm is equivalent to solving the problem $$ \operatorname*{minimize}_{x}\biggl\{\frac{1}{2}\lVert x-a \rVert_2^2 + \frac{t}{2}\lVert x \rVert_1^2 \biggr\} $$ which is in turn equivalent to $$ \operatorname*{minimize}_{x,r}\biggl\{\frac{1}{2}\lVert x-a \rVert_2^2 + \frac{t}{2}r^2 : r \ge \lVert x \rVert_1 \biggr\} $$ whose Lagrangian reads $$ L(x,r;\mu) = \frac{1}{2}\lVert x-a \rVert_2^2 + \frac{t}{2}r^2 + \mu \bigl(\lVert x \rVert_1 -r \bigr). $$

Computing the gradient and checking when it vanishes gives rise to the following system of equations (KKT conditions) that must be satisfied by the critical points $$ \left\{\begin{aligned} &x-a+\mu\partial\lVert x \rVert_1 \ni 0 \\ &rt-\mu = 0 \\ &\mu \bigl(\lVert x \rVert_1 -r \bigr)=0,\quad \mu\ge 0. \end{aligned}\right. $$

The first line tells us that $x$ can be computed with the soft-thresholding operator $$ x_i = [p_t(a)]_i = \operatorname{sign}(a_i)\bigl(\lvert a_i \rvert - \mu\bigr)_+ $$ while, from the remaining equations, we get $$ \mu = rt = t \lVert x \rVert_1 = t \sum_i \bigl(\lvert a_i \rvert - \mu\bigr)_+ $$ that allows us to compute $\mu$. For instance, assuming without loss of generality that $\lvert a_1 \rvert \ge \lvert a_2 \rvert \ge \lvert a_3 \rvert \ge \dots$, this boils down to computing $$ \mu(n) = \frac{t}{1+nt} \sum_{i=1}^n \lvert a_i \rvert $$ for $n=1,2,\dots$ until \begin{align} \lvert a_i \rvert &> \mu(n) &&\text{for all } i \le n \\ \lvert a_i \rvert &\le \mu(n) &&\text{for all } i > n. \end{align}