Why do derivations for SVM not consider slack variables for inequality constraints?

463 Views Asked by At

(This is related to a question I asked a few days ago)

I've been through a few SVM derivations and the ones I follow are this Caltech lecture and this MIT lecture.

However, with both of them the Lagrangian they try to minimize is:

$$\mathcal{L}(\boldsymbol{w}, b, \boldsymbol{\alpha}) = \frac{1}{2}\|\boldsymbol{w}\|^2 + \sum_i \alpha_i\left[y_i(\boldsymbol{w}^T\boldsymbol{x}+b)-1\right].$$

As opposed to: $$\mathcal{L}(\boldsymbol{w}, b, \boldsymbol{\alpha}) = \frac{1}{2}\|\boldsymbol{w}\|^2 + \sum_i \alpha_i\left[y_i(\boldsymbol{w}^T\boldsymbol{x}+b)-1-s_i\right].$$

Where $s_i$ are the slack variables obtained from the inequalities: $$y_i(w^Tx + b) >= 1$$ Hence $$y_i(w^Tx + b) = 1 + s_i$$

Is it because when the inequality is not inactive all the Lagrange multipliers for it are going to be 0 and so we don't even consider them? If yes, then this bit from the Caltech lecture contradicts this statement

This implies that we will get the $\alpha_i$'s as zero but we shouldn't even be talking about them if we're not even considering the inactive equalities in the Lagrangian right.

1

There are 1 best solutions below

5
On

We have considered the inequality constraint.

If you introduce the slack variables, then we do not impose a sign constraint on $\alpha_i$ but we impose a sign constraint on $s_i \ge 0$.

If we do not introduce the slack variables (which is the common practice). We ends up with the sign constraint that $\alpha_i \ge 0$.