I am reading about SVMs and want to confirm that I understand the optimality conditions. Details below:
Consider the $n$ points $x_1, x_2, \dots, x_n$, each with $ d$ dimensions, and consider $ n$ points $y_1, y_2, \dots, y_n \in \{-1, 1\}$.
Consider $\displaystyle w\in\mathbb R^d, b\in\mathbb R, C\in \mathbb R, C\geq 0,$ and $ \epsilon_1, \epsilon_2, \dots, \epsilon_n \in \mathbb R$. Assume we want to minimize the following function with respect to $\displaystyle w, b$ and $\epsilon_1, \dots, \epsilon_n$: $$\displaystyle \frac{1}{2} ||w||^2 + C \sum_{i=1}^n \epsilon_i,$$
subject to the following conditions:
- $\epsilon_i \geq 0$ for all i
- $y_i\cdot\left(w^T\cdot x_i + b\right) + \epsilon_i \geq 1,$ for all i.
I wrote the problem in terms of Lagrange multipliers:
$$\displaystyle \min_{w,b, \epsilon, \alpha, \beta} \frac{1}{2} ||w||^2 + C \sum_{i=1}^n \epsilon_i + \sum_{i=1}^n \alpha_i\cdot \left(1 - y_i\cdot(w^Tx_i + b) - \epsilon_i)\right) - \sum_{i=1}^n \beta_i\cdot \epsilon_i,$$ and obtained the following conditions:
- $w = \sum_{i=1}^n \alpha_i y_i x_i$
- $\sum_{i=1}^n \alpha_i y_i = 0$
- $ C = \alpha_i + \beta_i$ for all i.
- $\alpha_i \geq 0$ for all i.
- $\beta_i \geq 0$ for all i.
- $ \alpha_i\cdot \left(1 - y_i\cdot(w^Tx_i + b) - \epsilon_i)\right) = 0$ for all i
- $\beta_i \cdot \epsilon_i = 0$ for all i.
Could someone confirm that all of these conditions are necessary? I'm particularly unsure if I need conditions 6 and 7 from above. Thanks a lot!
Yes, the complementary conditions (6-7) are necessary. They give the name support vector to the method: if $\alpha_i>0$, then $x_i$ supports $w$ and $x_i$ lies on the $1-\epsilon_i$ margin (by 6). Note that if we were using the hard-margin formulation (without the $\epsilon$), then this would already imply $x_i$ was on the margin.
Nevertheless, in the soft-margin formulation, either $\epsilon_i=0$ or $\alpha_i=C$ (by 3,7).