Convex Analysis: Dual Problem

75 Views Asked by At

In machine and deep learning, particularly for binary classification, we often have something along the following lines. $X \subset \mathbb R^n$ is the input domain and $Y = \left\{ -1, 1\right\}$ is the set of binary labels. Now, I'd like to consider the $\ell_2$-regularized loss minimisation problem

$$\min_{w, b} \left\{ J\left(A^T w - yb\right) + \frac{\alpha}{2} \| w \|^2 \right\}, \tag{$\mathcal P$}$$

where $J : \mathbb R^m \to \mathbb R_{\infty}$, $y \in \mathbb R^m$, $\alpha > 0$ and matrix $A \in \mathbb R^{n \times m}$ with $A := \left[ y_1 x_1, \dots, y_m x_m \right]$. We're now supposed to rewrite the Eq. $\left(\mathcal P \right)$ by introducing an additional variable $z$ with constraint $z = A^T w - y b$ such that the dual problem is given by

$$\sup_{p}\inf_{z, w, b}\left\{ J(z) + \frac{\alpha}{2}\vert\vert w\vert\vert^2 - \langle p, A^T w - yb - z\rangle \right\} . \tag{$ \mathcal D$}$$

I first thought I should introduce the variable $z := A^T w - yb$, but, then, how would $(\mathcal P)$ look like? It becomes clear from $(\mathcal P)$ that we want to minimize with respect to $w, b$. But if I now introduced $z = A^Tw - yb$, the problem is that I wouldn't know how to reformulate $(\mathcal P)$. Would I then minimize or maximize with respect to $z$?

1

There are 1 best solutions below

0
On

If you introduce the variable $z=A^Tw-yb$ then your problem becomes

$$\min\limits_{z,w,b}\left\{J(z) + \frac{\alpha}{2}\|w\|^2: z=A^Tw-yb\right\}$$

You can now use a Lagrange multiplier to relax the constraint $z=A^Tw-yb$:

$$\min\limits_{z,w,b}\max_{\lambda}\left\{J(z) + \frac{\alpha}{2}\|w\|^2 + \langle \lambda, A^Tw-yb-z\rangle\right\}$$

Under some qualification conditions, the max and min can be exchanged without altering the problem. This leads to the dual problem

$$\max_{\lambda}\min\limits_{z,w,b}\left\{J(z) + \frac{\alpha}{2}\|w\|^2 + \langle \lambda, A^Tw-yb-z\rangle\right\}$$