(This is related to a question I asked a few days ago)
I've been through a few SVM derivations and the ones I follow are this Caltech lecture and this MIT lecture.
However, with both of them the Lagrangian they try to minimize is:
$$\mathcal{L}(\boldsymbol{w}, b, \boldsymbol{\alpha}) = \frac{1}{2}\|\boldsymbol{w}\|^2 + \sum_i \alpha_i\left[y_i(\boldsymbol{w}^T\boldsymbol{x}+b)-1\right].$$
As opposed to: $$\mathcal{L}(\boldsymbol{w}, b, \boldsymbol{\alpha}) = \frac{1}{2}\|\boldsymbol{w}\|^2 + \sum_i \alpha_i\left[y_i(\boldsymbol{w}^T\boldsymbol{x}+b)-1-s_i\right].$$
Where $s_i$ are the slack variables obtained from the inequalities: $$y_i(w^Tx + b) >= 1$$ Hence $$y_i(w^Tx + b) = 1 + s_i$$
Is it because when the inequality is not inactive all the Lagrange multipliers for it are going to be 0 and so we don't even consider them? If yes, then this bit from the Caltech lecture contradicts this statement
This implies that we will get the $\alpha_i$'s as zero but we shouldn't even be talking about them if we're not even considering the inactive equalities in the Lagrangian right.
We have considered the inequality constraint.
If you introduce the slack variables, then we do not impose a sign constraint on $\alpha_i$ but we impose a sign constraint on $s_i \ge 0$.
If we do not introduce the slack variables (which is the common practice). We ends up with the sign constraint that $\alpha_i \ge 0$.