Gradient mapping minimization problem

626 Views Asked by At

I am trying to solve this particular problem which can be found in this paper: $$\underset{y\in\Delta_n}{\text{argmin}}\{\langle\nabla f(x),y-x\rangle+\dfrac{1}{2}L\|y-x\|_1^2\}$$ I tried formulating the above problem as a projection problem: $$\underset{y\in\Delta_n}{\text{argmin}}\{\|y-x+\dfrac{1}{L}\nabla f(x)\|_1\},$$ but I don't know of any methods to compute projection for L1 norm. The method provided in the aforementioned paper (Section 5) started with $$\psi^*=\underset{x\in\Delta_n}{\text{min}}\{\langle\bar{g},x-\bar{x}\rangle+\dfrac{1}{2}L\|x-\bar{x}\|_1^2\}$$ and ended with $$-\psi^*=\underset{\tau\ge0}{\text{min}}\{\sum_{i=1}^{n}\bar{x}^{(i)}(\bar{g}^{(i)}-2\tau)_++\dfrac{\tau^2}{2L}\},$$ where $\left(a\right)_+=max\{a,0\}$.

I am not sure of how to construct the optimal solution $x$ from there.

2

There are 2 best solutions below

7
On BEST ANSWER

Remark: Let's first observe that the assumption $\sum_{1 \le i \le n}\bar{g}^{(i)}$ made by the author (Nesterov) is really no loss of generality; otherwise simply replace each $\bar{g}^{(i)}$ with $\bar{g}^{(i)} - \frac{1}{n}\sum_{1 \le k \le n}\bar{g}^{(k)}$ in the computations. BTW, though the author's, assumption is sufficient to obtain the results, it's much 'tighter' to assume $\underset{1 \le k \le 1}{\text{min }}\bar{g}^{(k)} = 0$, since $\lambda \le \bar{g}^{(i)} + \tau\text{ } \forall i$ iff $\lambda \le \underset{1 \le k \le 1}{\text{min }}\bar{g}^{(k)} + \tau$, and the author would want to work with the constraint $\lambda \le \tau$.

Now, sort the components of $\bar{g}$ so that $\bar{g}^n \le ... \le \bar{g}^2 \le \bar{g}^1$. In the optimization problem for $\tau$, the objective function $h(\tau) := \sum_{1 \le i \le n}\bar{x}^i(\bar{g}^i - 2\tau)_+ +\frac{\tau^2}{2L}$ is piecewise quadratic with kinks (singularities) at the points $\tau_k = \frac{1}{2}\bar{g}^k$, $k=1, 2, ..., n$, which partition the open set $\mathbb{R}\setminus\{\tau_k | k=1, 2, ..., n\}$ into $n + 1$ open intervals $I_0 := (\tau_1, +\infty)$; $I_k := (\tau_{k + 1}, \tau_{k})$ if $1 \le k < n$; $I_n := (-\infty, \tau_n)$. It is obvious that $\forall k=0,1, 2, ..., n$ and $\forall \tau \in I_k$, it holds that$h(\tau) = \sum_{i=1}^k\bar{x}^i(\bar{g}^i - 2 \tau) + \frac{\tau^2}{2L}$, and so the nonsingular critical points of $h$ are $\zeta_k := 2L\sum_{i=1}^k\bar{x}^i$ (of course, provided $\zeta_k \in I_k$). To compute $\tau^* := \underset{\tau \ge 0}{\text{argmin }}h(\tau)$, it then suffices to check the values value of $h$ on the finite set of points \begin{eqnarray}P := \{\tau_k | 1 \le k \le n, \tau_k \ge 0\} \cup \{\zeta_k | 1 \le k \le n, \zeta_k \in I_k, \zeta_k \ge 0\},\end{eqnarray} to get the global minimum point $\tau^*$. Note that $1 \le card(P) \le 2n + 1$. Now, as explained in the paper, we use this $\tau^*$ and (the original unsorted version of) $\bar{g}$ to calculate $s^*$ via the point-wise max operation $s^* = \text{max}(\tau^*, \bar{g} - \tau^*) \in \mathbb{R}^n_+$.

Finally, we minimize a linear function on a simplex to obtain $x^*$, as explained in the paper.

A note on the proof of Lemma 6. It's noteworthy that one can issue a one-line proof for Lemma 6 by simply observing that the RHS is precisely the value of the convex conjugate of the function $E^* \rightarrow \mathbb{R}, s \mapsto \frac{1}{2}L(\|s-g\|^*)^2$ at the point $h \in E$. Using elementary properties of conjugation, this value equals $L \frac{1}{2}\|\frac{1}{L}h\|^{\frac{2}{2-1}} + \langle g, h\rangle = \langle g, h\rangle + \frac{1}{2L}\|h\|^2$, which is the LHS.

2
On

I give a convex quadratic program formulation as suggested in comments. Notice that you need an iterative approximation scheme to solve that QP like projected gradient or interior point method. The referenced paper indicates, it is possible solve the original problem with less effort.

Assume the setting is $\mathbb R^n$. Then, we can restate the problem as constrained quadratic program with convex objective

Assume that $x\in\mathbb R^n$ is fixed and let $g=\nabla f(x)\in\mathbb R^n$. Let $e\in\mathbb R^n$ denote the vector with only ones. Then, we have \begin{align} &\min \{ g^T(y-x) + \tfrac{1}{2} L \| y - x \|_1^2 \mid y\in\Delta_n \}\\ &= \min \{ g^Ts + \tfrac{1}{2} L \| s \|_1^2 \mid x+s\in \Delta_n \} \\ &= \min \{ g^T(u-v) + \tfrac{1}{2} L (\begin{smallmatrix}u \\ v\end{smallmatrix})^T Q (\begin{smallmatrix}u \\ v\end{smallmatrix}) \mid x+u-v\in \Delta_n,\, u,v\ge 0 \}, \\ \end{align} where $$ Q = \begin{pmatrix}e \\ e\end{pmatrix} \begin{pmatrix}e \\ e\end{pmatrix}^T. $$