Efficient Way to Evaluate the Proximal Operator of $ f \left( x \right) = {\left\| x \right\|}_{2} + {I}_{\geq 0} \left( x \right) $

Question

Efficient Way to Evaluate the Proximal Operator of $ f \left( x \right) = {\left\| x \right\|}_{2} + {I}_{\geq 0} \left( x \right) $

607 Views Asked by littleO At 06 Jun 2025 - 5:07

Is there an efficient way to evaluate the proximal operator of the function $f:\mathbb R^n \to \mathbb R \cup \{ \infty \}$ defined by \begin{equation} f(x) = \| x \|_2 + I_{\geq 0}(x), \end{equation} where $I_{\geq 0}$ is the indicator function of the nonnegative orthant: \begin{equation} I_{\geq 0}(x) = \begin{cases} 0 & \quad \text{if } x \geq 0,\\ \infty & \quad \text{otherwise.} \end{cases} \end{equation} (The inequality $x \geq 0$ is interpreted componentwise.)

In other words, given $\hat x \in \mathbb R^n$, is there an easy way to solve the optimization problem \begin{align} \text{minimize} & \quad \|x\|_2 + \frac{1}{2} \| x - \hat x \|_2^2 \\ \text{subject to} & \quad x \geq 0. \end{align} The variable in this optimization problem is $x \in \mathbb R^n$. Ideally I'd like a closed-form solution to this optimization problem, or else a way to compute the solution extremely quickly (without having to rely on an iterative algorithm).

Thoughts: the term $\| x \|_2$ doesn't depend on the direction of $x$. If not for the nonnegativity constraints on $x$, we would pick $x$ to point in the same direction as $\hat x$, which simplifies the optimization problem enough that we can now solve it easily. Perhaps there's a similar way to think about the problem when we have nonnegativity constraints on $x$.

Original Q&A

There are 3 best solutions below

dohmatob On 12 Jan 2016 - 3:13

Let $C$ be a nonempty closed convex set of a Hilbert space and $\gamma > 0$ (for example $C = \mathbb R^n_+$ and $\gamma =1$ in your particular case). Then for $x \ne 0$, we have $$\|x\|_2 \le \frac{1}{2}(\|x\|_2^2 r^{-1} + r), \forall r > 0,$$ with equality iff $r = \|x\|_2.$ Thus $\|x\|_2 = \min_{r > 0}\frac{1}{2}(\|x\|_2^2r^{-1} + r)$, and so $$ \begin{split} \min_{x \in C}\gamma \|x\|_2 + \frac{1}{2}\|x-a\|_2^2 &= \frac{1}{2}\min_{x \in C}\min_{r>0}(\gamma\|x\|_2^2r^{-1}+ \|x-a\|_2^2 + \gamma r)\\ & = \frac{1}{2}\min_{x \in C}\min_{r>0}(1 + \gamma r^{-1})\left\|x - \frac{r}{r+\gamma}a\right\|_2^2\\ &\hspace{5em}+\text{ things depending on }r\text{ alone}, \end{split} $$ which can be solved via an alternating scheme like: $$x^{(k+1)} = P_C\left(\frac{r^{(k)}}{{r^{(k)}+\gamma}}a\right),\; r^{(k+1)} = \|x^{(k+1)}\|_2.$$ Using the non-expansiveness of the projection operator $P_C$, it's not hard to show that $$|r^{(k+1)} - r^{(k)}| \le \frac{\|a\|_2}{\gamma}|r^{(k)} - r^{(k-1)}|,$$ and so this alternating scheme converges exponentially fast if $\|a\|_2 < \gamma$ (this is a sufficient but not necessary condition).

littleO On 01 May 2016 - 11:01

I think there is a simple way to look at it without using duality. Our goal is to find the minimizer for the problem \begin{align} \tag{$\spadesuit$} \text{minimize} & \quad \| x \|_2 + \frac{1}{2t} \| x - \hat x \|_2^2 \\ \text{subject to} & \quad x \geq 0. \end{align}

First note that if $\hat x_i < 0$, then there is no benefit from taking $x_i$ to be positive. If $x_i$ were positive, then both terms in the objective could be reduced just by setting $x_i = 0$.

It remains to select values for the other components of $x$. This is a smaller optimization problem, with one unknown for each positive component of $\hat x$. The negative components of $\hat x$ are irrelevant to the solution of this reduced problem. Thus, we would still arrive at the same final answer if the negative components of $\hat x$ were set equal to $0$ at the very beginning.

In other words, problem ($\spadesuit$) is equivalent to the problem \begin{align} \text{minimize} & \quad \| x \|_2 + \frac{1}{2t} \| x - \max(\hat x ,0)\|_2^2 \\ \text{subject to} & \quad x \geq 0, \end{align} which in turn is equivalent to the problem \begin{align} \text{minimize} & \quad \| x \|_2 + \frac{1}{2t} \| x - \max(\hat x ,0)\|_2^2 \end{align} (because there would be no benefit from taking any component of $x$ to be negative).

This shows that $\text{prox}_{tf}(\hat x) = \text{prox}_{t \| \cdot \|_2}(\max(\hat x, 0))$.

**Michael Grant** · Accepted Answer

What I have found is that looking at the dual problem with a custom prox function is almost always helpful. To derive it, I first rewrite the problem as follows: \begin{array}{ll} \text{minimize} & \tfrac{1}{2} \| x - \hat{x} \|_2^2 + t \\ \text{subject to} & \| x \|_2 \leq t \\ & x \succeq 0 \end{array} where $t$ is a new variable. To construct the dual I use a conic Lagrange multiplier $(w,s)$ for the 2-norm constraint and a multiplier $z$ for the nonnegativity constraint. The dual problem is \begin{array}{ll} \text{maximize} & \inf_{x,t} \tfrac{1}{2} \| x - \hat{x} \|_2^2 + t - st - w^Tx - z^T x \\ \text{subject to} & \|w\|_2 \leq s \\ & z \succeq 0 \end{array} Looking at the optimality conditions for $x$ and $t$ yields $$x=\hat{x}+w+z \quad s=1$$ This is what the dual gives me: a template for the structure of the solution. In this case, $x$ is the sum of the target vector $\hat{x}$, a vector with bounded norm $w$, and a nonnegative vector $z$. Sure, we're not there yet, but we're closer.

Eliminating the primal variables from the dual objective yields $$\tfrac{1}{2}\|w+z\|_2^2 - (w+z)^T(\hat{x}+w+z) = -\tfrac{1}{2}\|\hat{x}+w+z\|_2^2 + \tfrac{1}{2}\|\hat{x}\|_2^2$$ So a "clean" dual of the prox function is \begin{array}{ll} \text{maximize} & -\tfrac{1}{2}\|\hat{x}+w+z\|_2^2 + \tfrac{1}{2}\|\hat{x}\|_2^2 \\ \text{subject to} & \|w\|_2 \leq 1 \\ & z \succeq 0 \end{array} At this point I can almost read off the solution by inspection.

First, suppose $\hat{x}\preceq 0$. Then it should be evident that the optimal values are $x=0$, $w=0$, $z=-\hat{x}$. From now on, we can assume that at least one value of $\hat{x}$ is positive.

But in fact, we can go further. Partition $\hat{x}$, rearranging if necessary, into its positive and nonpositive components. That is, $$\hat{x} = \begin{bmatrix} \hat{x}_+ \\ \hat{x}_- \end{bmatrix}, \quad \hat{x}_+\in\mathbb{R}^k_{++}, ~ \hat{x}_-\in-\mathbb{R}^{n-k}_+$$ Partition $w=(w_+,w_-)$, $z=(z_+,z_-)$, and $x=(x_+,x_-)$ in identical fashion. This partitions the full problem as follows: \begin{array}{ll} \text{maximize} & -\tfrac{1}{2}\|\hat{x}_++w_++z_+\|_2^2 -\tfrac{1}{2}\|\hat{x}_-+w_-+z_-\|_2^2 + \tfrac{1}{2}\|\hat{x}\|_2^2 \\ \text{subject to} & \|w_+\|_2^2 + \|w_-\|_2^2 \leq 1 \\ & z_+,z_- \succeq 0 \end{array} The optimal values of the $-$ portion of the objective are clear: $$z_- = -\hat{x}_-, \quad w_- = 0 \quad\Longrightarrow\quad x_- = 0$$ So in other words, any nonpositive element of $\hat{x}$ corresponds to a zero at the optimal solution.

Furthermore, we claim that $z_+=0$ as well. After all, since $\hat{x}_++z_+\succeq \hat{x}_+$, any nonzero value of $z_+$ would reduce the objective (to be maximized) with no other benefit. So with $z_+$ clamped to zero, we're left with a standard $\ell_2$ prox function for the positive portion of the vector! That is, $$w_+ = \begin{cases} -\hat{x}_+ & \|\hat{x}\|_2 \leq 1 \\ -\|\hat{x}_+\|_2^{-1} \hat{x}_+ & \|\hat{x}\|_2\geq 1 \end{cases}, \quad x_+ = \begin{cases} 0 & \|\hat{x}\|_2 \leq 1 \\ (1-\|\hat{x}_+\|_2^{-1}) \hat{x}_+ & \|\hat{x}\|_2\geq 1 \end{cases}$$

In sum: to compute this prox, clamp the negative values of $\hat{x}$ to zero, and scale the positive components according to the standard $\ell_2$ prox function. This is very similar to the $\ell_1 + \ell_2$ case: the $\ell_1$ shrinkage comes first followed by the $\ell_2$ scaling.

Efficient Way to Evaluate the Proximal Operator of $ f \left( x \right) = {\left\| x \right\|}_{2} + {I}_{\geq 0} \left( x \right) $

There are 3 best solutions below

Related Questions in CONVEX-ANALYSIS

Related Questions in CONVEX-OPTIMIZATION

Related Questions in PROXIMAL-OPERATORS

Trending Questions

Popular # Hahtags

Popular Questions