I am trying to solve this particular problem which can be found in this paper: $$\underset{y\in\Delta_n}{\text{argmin}}\{\langle\nabla f(x),y-x\rangle+\dfrac{1}{2}L\|y-x\|_1^2\}$$ I tried formulating the above problem as a projection problem: $$\underset{y\in\Delta_n}{\text{argmin}}\{\|y-x+\dfrac{1}{L}\nabla f(x)\|_1\},$$ but I don't know of any methods to compute projection for L1 norm. The method provided in the aforementioned paper (Section 5) started with $$\psi^*=\underset{x\in\Delta_n}{\text{min}}\{\langle\bar{g},x-\bar{x}\rangle+\dfrac{1}{2}L\|x-\bar{x}\|_1^2\}$$ and ended with $$-\psi^*=\underset{\tau\ge0}{\text{min}}\{\sum_{i=1}^{n}\bar{x}^{(i)}(\bar{g}^{(i)}-2\tau)_++\dfrac{\tau^2}{2L}\},$$ where $\left(a\right)_+=max\{a,0\}$.
I am not sure of how to construct the optimal solution $x$ from there.
Remark: Let's first observe that the assumption $\sum_{1 \le i \le n}\bar{g}^{(i)}$ made by the author (Nesterov) is really no loss of generality; otherwise simply replace each $\bar{g}^{(i)}$ with $\bar{g}^{(i)} - \frac{1}{n}\sum_{1 \le k \le n}\bar{g}^{(k)}$ in the computations. BTW, though the author's, assumption is sufficient to obtain the results, it's much 'tighter' to assume $\underset{1 \le k \le 1}{\text{min }}\bar{g}^{(k)} = 0$, since $\lambda \le \bar{g}^{(i)} + \tau\text{ } \forall i$ iff $\lambda \le \underset{1 \le k \le 1}{\text{min }}\bar{g}^{(k)} + \tau$, and the author would want to work with the constraint $\lambda \le \tau$.
Now, sort the components of $\bar{g}$ so that $\bar{g}^n \le ... \le \bar{g}^2 \le \bar{g}^1$. In the optimization problem for $\tau$, the objective function $h(\tau) := \sum_{1 \le i \le n}\bar{x}^i(\bar{g}^i - 2\tau)_+ +\frac{\tau^2}{2L}$ is piecewise quadratic with kinks (singularities) at the points $\tau_k = \frac{1}{2}\bar{g}^k$, $k=1, 2, ..., n$, which partition the open set $\mathbb{R}\setminus\{\tau_k | k=1, 2, ..., n\}$ into $n + 1$ open intervals $I_0 := (\tau_1, +\infty)$; $I_k := (\tau_{k + 1}, \tau_{k})$ if $1 \le k < n$; $I_n := (-\infty, \tau_n)$. It is obvious that $\forall k=0,1, 2, ..., n$ and $\forall \tau \in I_k$, it holds that$h(\tau) = \sum_{i=1}^k\bar{x}^i(\bar{g}^i - 2 \tau) + \frac{\tau^2}{2L}$, and so the nonsingular critical points of $h$ are $\zeta_k := 2L\sum_{i=1}^k\bar{x}^i$ (of course, provided $\zeta_k \in I_k$). To compute $\tau^* := \underset{\tau \ge 0}{\text{argmin }}h(\tau)$, it then suffices to check the values value of $h$ on the finite set of points \begin{eqnarray}P := \{\tau_k | 1 \le k \le n, \tau_k \ge 0\} \cup \{\zeta_k | 1 \le k \le n, \zeta_k \in I_k, \zeta_k \ge 0\},\end{eqnarray} to get the global minimum point $\tau^*$. Note that $1 \le card(P) \le 2n + 1$. Now, as explained in the paper, we use this $\tau^*$ and (the original unsorted version of) $\bar{g}$ to calculate $s^*$ via the point-wise max operation $s^* = \text{max}(\tau^*, \bar{g} - \tau^*) \in \mathbb{R}^n_+$.
Finally, we minimize a linear function on a simplex to obtain $x^*$, as explained in the paper.
A note on the proof of Lemma 6. It's noteworthy that one can issue a one-line proof for Lemma 6 by simply observing that the RHS is precisely the value of the convex conjugate of the function $E^* \rightarrow \mathbb{R}, s \mapsto \frac{1}{2}L(\|s-g\|^*)^2$ at the point $h \in E$. Using elementary properties of conjugation, this value equals $L \frac{1}{2}\|\frac{1}{L}h\|^{\frac{2}{2-1}} + \langle g, h\rangle = \langle g, h\rangle + \frac{1}{2L}\|h\|^2$, which is the LHS.