Is there an efficient way to evaluate the proximal operator of the function $f:\mathbb R^n \to \mathbb R \cup \{ \infty \}$ defined by \begin{equation} f(x) = \| x \|_2 + I_{\geq 0}(x), \end{equation} where $I_{\geq 0}$ is the indicator function of the nonnegative orthant: \begin{equation} I_{\geq 0}(x) = \begin{cases} 0 & \quad \text{if } x \geq 0,\\ \infty & \quad \text{otherwise.} \end{cases} \end{equation} (The inequality $x \geq 0$ is interpreted componentwise.)
In other words, given $\hat x \in \mathbb R^n$, is there an easy way to solve the optimization problem \begin{align} \text{minimize} & \quad \|x\|_2 + \frac{1}{2} \| x - \hat x \|_2^2 \\ \text{subject to} & \quad x \geq 0. \end{align} The variable in this optimization problem is $x \in \mathbb R^n$. Ideally I'd like a closed-form solution to this optimization problem, or else a way to compute the solution extremely quickly (without having to rely on an iterative algorithm).
Thoughts: the term $\| x \|_2$ doesn't depend on the direction of $x$. If not for the nonnegativity constraints on $x$, we would pick $x$ to point in the same direction as $\hat x$, which simplifies the optimization problem enough that we can now solve it easily. Perhaps there's a similar way to think about the problem when we have nonnegativity constraints on $x$.
What I have found is that looking at the dual problem with a custom prox function is almost always helpful. To derive it, I first rewrite the problem as follows: \begin{array}{ll} \text{minimize} & \tfrac{1}{2} \| x - \hat{x} \|_2^2 + t \\ \text{subject to} & \| x \|_2 \leq t \\ & x \succeq 0 \end{array} where $t$ is a new variable. To construct the dual I use a conic Lagrange multiplier $(w,s)$ for the 2-norm constraint and a multiplier $z$ for the nonnegativity constraint. The dual problem is \begin{array}{ll} \text{maximize} & \inf_{x,t} \tfrac{1}{2} \| x - \hat{x} \|_2^2 + t - st - w^Tx - z^T x \\ \text{subject to} & \|w\|_2 \leq s \\ & z \succeq 0 \end{array} Looking at the optimality conditions for $x$ and $t$ yields $$x=\hat{x}+w+z \quad s=1$$ This is what the dual gives me: a template for the structure of the solution. In this case, $x$ is the sum of the target vector $\hat{x}$, a vector with bounded norm $w$, and a nonnegative vector $z$. Sure, we're not there yet, but we're closer.
Eliminating the primal variables from the dual objective yields $$\tfrac{1}{2}\|w+z\|_2^2 - (w+z)^T(\hat{x}+w+z) = -\tfrac{1}{2}\|\hat{x}+w+z\|_2^2 + \tfrac{1}{2}\|\hat{x}\|_2^2$$ So a "clean" dual of the prox function is \begin{array}{ll} \text{maximize} & -\tfrac{1}{2}\|\hat{x}+w+z\|_2^2 + \tfrac{1}{2}\|\hat{x}\|_2^2 \\ \text{subject to} & \|w\|_2 \leq 1 \\ & z \succeq 0 \end{array} At this point I can almost read off the solution by inspection.
First, suppose $\hat{x}\preceq 0$. Then it should be evident that the optimal values are $x=0$, $w=0$, $z=-\hat{x}$. From now on, we can assume that at least one value of $\hat{x}$ is positive.
But in fact, we can go further. Partition $\hat{x}$, rearranging if necessary, into its positive and nonpositive components. That is, $$\hat{x} = \begin{bmatrix} \hat{x}_+ \\ \hat{x}_- \end{bmatrix}, \quad \hat{x}_+\in\mathbb{R}^k_{++}, ~ \hat{x}_-\in-\mathbb{R}^{n-k}_+$$ Partition $w=(w_+,w_-)$, $z=(z_+,z_-)$, and $x=(x_+,x_-)$ in identical fashion. This partitions the full problem as follows: \begin{array}{ll} \text{maximize} & -\tfrac{1}{2}\|\hat{x}_++w_++z_+\|_2^2 -\tfrac{1}{2}\|\hat{x}_-+w_-+z_-\|_2^2 + \tfrac{1}{2}\|\hat{x}\|_2^2 \\ \text{subject to} & \|w_+\|_2^2 + \|w_-\|_2^2 \leq 1 \\ & z_+,z_- \succeq 0 \end{array} The optimal values of the $-$ portion of the objective are clear: $$z_- = -\hat{x}_-, \quad w_- = 0 \quad\Longrightarrow\quad x_- = 0$$ So in other words, any nonpositive element of $\hat{x}$ corresponds to a zero at the optimal solution.
Furthermore, we claim that $z_+=0$ as well. After all, since $\hat{x}_++z_+\succeq \hat{x}_+$, any nonzero value of $z_+$ would reduce the objective (to be maximized) with no other benefit. So with $z_+$ clamped to zero, we're left with a standard $\ell_2$ prox function for the positive portion of the vector! That is, $$w_+ = \begin{cases} -\hat{x}_+ & \|\hat{x}\|_2 \leq 1 \\ -\|\hat{x}_+\|_2^{-1} \hat{x}_+ & \|\hat{x}\|_2\geq 1 \end{cases}, \quad x_+ = \begin{cases} 0 & \|\hat{x}\|_2 \leq 1 \\ (1-\|\hat{x}_+\|_2^{-1}) \hat{x}_+ & \|\hat{x}\|_2\geq 1 \end{cases}$$
In sum: to compute this prox, clamp the negative values of $\hat{x}$ to zero, and scale the positive components according to the standard $\ell_2$ prox function. This is very similar to the $\ell_1 + \ell_2$ case: the $\ell_1$ shrinkage comes first followed by the $\ell_2$ scaling.