Optimization Problem Involving $ {L}_{2} $, $ {L}_{1} $ Norm and Constraints

406 Views Asked by At

Can somebody suggest me how to solve the following optimization problem? \begin{equation*} F(\mathbf{w},\xi)= \begin{aligned} & \underset{\mathbf{w,\xi}}{\text{minimize}} & & \frac{1}{2}\|\mathbf{w}-\mathbf{w}_{t-1}\|^2_2 + \beta \|\mathbf{w}\|_1 +\alpha C\xi\\ & \text{subject to} & & 1-y\mathbf{w}^T\mathbf{x} \leq \xi, \xi \geq 0 \end{aligned} \end{equation*}

An equivalent formulation is this: \begin{equation*} F(\mathbf{w},\xi)= \begin{aligned} & \underset{\mathbf{w,\xi}}{\text{minimize}} & & \frac{1}{2}\|\mathbf{w}-\mathbf{w}_{t-1}\|^2_2 +\alpha C\xi\\ & \text{subject to} & & 1-y\mathbf{w}^T\mathbf{x} \leq \xi, \xi \geq 0\\ & & &\|\mathbf{w}\|_1 \leq t \end{aligned} \end{equation*} Edit: I need solution techniques that works in online setting and suitable for large $\mathbf{w,x}$, say in millions.

1

There are 1 best solutions below

14
On BEST ANSWER

You can (and should) replace $\alpha C$ with $\alpha$ and $yw^Tx$ with $w^Tx$. There are many irrelevant details in your problem which can be "normalized" away.

Now, a bit of algebra reveals that it is optimal to set (indeed the problem in $\xi$ is the minimization of a line with slope $\alpha > 0$) \begin{eqnarray} \xi = (1 - w^Tx)_+ := \max(1-w^Tx, 0), \end{eqnarray} and then minimize $E(w) := \frac{1}{2}\|w-w_{t-1}\|^2 + \beta\|w\|_1 + \alpha(1 - w^Tx)_+$ for $w$.

Now, $E(w)$ can be rewritten as \begin{eqnarray} E(w) = g(w) + f(Kw), \end{eqnarray} where $K := x^T \in \mathbb{R}^{1 \times n}$, $g(w) := \frac{1}{2}\|w-w_{t-1}\|^2 + \beta\|w\|_1$, and $f(z) := \alpha(1 - z)_+$.

Thus minimizing $E(w)$ is equivalent to solving the following minimax game \begin{eqnarray} \underset{z \in \mathbb{R}}{\text{maximize }}\underset{w \in \mathbb{R}^n}{\text{minimize }}\langle z, Kw\rangle + g(w) - f^*(z), \end{eqnarray} where $f^*(z) := \underset{s \in \mathbb{R}}{\text{max }}zs - f(s)$, the convex conjugate of $f$ (compute it as exercise). It's straightforward to compute the proximal operators of $g$ (this will be a shrinkage) and $f^*$ (this will be a projection onto a compact interval followed by a translation), and so you have all the ingredients (except minor details ...) needed to invoke the proximal primal-dual algorithms of Chambolle-Pock, for example.

Finally, because $g$ is strongly convex, you will converge in $\mathcal{O}(\|x\|/\sqrt{\epsilon})$ iterations for a tolerance $\epsilon > 0$ on the duality gap.

Below are some details (useful for actually implementing the algorithms).

Computing $f^*$ and $\textrm{prox}_{\lambda f^*}$. Using basic properties of convex conjugates, we have \begin{eqnarray} (z)_+^* = i_{[0, 1]}(z) \implies (-z)_+^* = i_{[0,1]}(-z) = i_{[-1,0]}(z) \implies (1 - z)_+^* = z + i_{[-1,0]}(z), \end{eqnarray} and so $f^*(z) = \alpha(\frac{z}{\alpha}) + \alpha i_{[-1, 0]}(\frac{z}{\alpha}) = \begin{cases}z, &\mbox{ if} -\alpha \le z \le 0,\\+\infty, &\mbox{ otherwise.}\end{cases}$

A direct computation then yields that any prox step $\lambda > 0$, we have \begin{eqnarray}\textrm{prox}_{\lambda f^*}(z) = \textrm{proj}_{[\lambda - \alpha, \lambda]}(z) - \lambda, \end{eqnarray} i.e a projection onto a compact interval followed by a translation.