About the slack variable for hinge-loss SVM

2.6k Views Asked by At

The hinge-loss SVM is defined $$ \min_{w,b} \frac{1}{2}w^T w+\sum_{i=1}^{N}\max\{0,1-y_i(w^Tx_i +b)\} $$ By introducing a slack variable $\xi_i$, the optimization problem is changed to $$ \min_{w,b,\xi_i} \frac{1}{2}w^Tw+\sum_{i=1}^{N}\xi_i \\ \xi_i\geq 0 \\ \xi_i\geq 1-y_i(w^Tx_i +b) $$

But why this is right? My understanding on this is $$ \xi_i=\max\{0,1-y_i(w^Tx_i+b)\} $$ then, we just get $$ 1-y_i(w^Tx_i+b)>0 \Rightarrow \xi_i=1-y_i(w^Tx_i+b) \\ 1-y_i(w^Tx_i+b)=0 \Rightarrow \xi_i=0 \\ 1-y_i(w^Tx_i+b)<0 \Rightarrow \xi_i=0 $$ So, how could we achieve the inequality $~\xi_i\geq 1-y_i(w^Tx_i +b)~$? Is there any theorem on this? Does a slack variable need to do so?

What's wrong with my understanding? Please help check it.

UPDATE:

I just found the truth: $$ 1-y_i(w^Tx_i+b)=0 \Rightarrow \xi_i=0 \\ 1-y_i(w^Tx_i+b)<0 \Rightarrow \xi_i=0 $$ leads to $$ 1-y_i(w^Tx_i+b)=\xi_i \Leftarrow \xi_i=0 \\ 1-y_i(w^Tx_i+b)<\xi_i \Leftarrow \xi_i=0 $$ then, we have $$ 1-y_i(w^Tx_i+b)\leq\xi_i $$

1

There are 1 best solutions below

2
On BEST ANSWER

I'm glad you found your answer. For archival purposes I think it would be good to derive the equivalence in a couple of steps, though.

At first, you might be tempted to introduce $\xi_i$ by transforming your model from this $$\begin{array}{ll} \text{minimize}_{w,b} & \frac{1}{2}w^T w+\sum_{i=1}^{N}\max\{0,1-y_i(w^Tx_i +b)\} \end{array}$$ to this: $$\begin{array}{lll} \text{minimize}_{w,b,\xi} & \frac{1}{2}w^T w+\sum_{i=1}^{N}\xi_i \\ \text{subject to} & \max\{0,1-y_i(w^Tx_i +b)\} = \xi_i, & i=1,2,\dots,N \end{array}$$ This is indeed equivalent, but you've also broken convexity. It turns out that what you really want to do is transform your model this way instead: $$\begin{array}{lll} \text{minimize}_{w,b,\xi} & \frac{1}{2}w^T w+\sum_{i=1}^{N}\xi_i \\ \text{subject to} & \max\{0,1-y_i(w^Tx_i +b)\} \leq \xi_i, & i=1,2,\dots,N \end{array}$$ Now we still have convexity; but do we have equivalence?

Yes! And the reason why is that those inequalities must be active at the optimum. that is, $$(w^*,b^*,\xi^*)\text{ optimum} \quad\Longrightarrow\quad \max\{0,1-y_i((w^*)^Tx_i+b^*)\}=\xi^*_i, ~i=1,2\dots,N.$$ Suppose this isn't the case: suppose that, for a particular $j\in{1,2,\dots,N}$, we have $$\max\{0,1-y_i((w^*)^Tx_j+b^*)\}<\xi^*_j.$$ Then we can just reduce $\xi^*_j$ from its current value to $\bar{\xi}_j = \max\{0,1-y_i(w^Tx_j+b)\}$ without violating the inequality; and this will reduce the objective as well! This contradicts the claim that the original $\xi^*$ was optimal.

Now that we've taken this step, we can simply take advantage of the fact that $$\max\{a,b\}\leq c \quad\Longleftrightarrow\quad a \leq c,~b\leq c$$ for any real $a,b,c$. This allows us to replace those individual inequalities with $$\begin{array}{lll} \text{minimize}_{w,b,\xi} & \frac{1}{2}w^T w+\sum_{i=1}^{N}\xi_i \\ \text{subject to} & 0 \leq \xi_i, ~ i=1,2,\dots,N \\ & 1-y_i(w^Tx_i +b) \leq \xi_i, ~ i=1,2,\dots, N \end{array}$$ And we've arrived at the result.