In the paper Sequential Minimal Optimization:A Fast Algorithm for Training Support Vector Machines, the optimization problem for linearly inseparable SVM is \begin{align} \min\limits_{\boldsymbol{w},b,\boldsymbol{\xi}} &\; \frac{1}{2} \| \boldsymbol{w} \|_2^2 + C \sum_{i=1}^N \xi_i \\ s.t. &\;y_i(\boldsymbol{w}^T\boldsymbol{x}_i -b) \ge 1 - \xi_i, \quad \forall i \in \{1,\ldots, N\} \\ &\; \xi_i \ge 0, \quad \forall i \in \{1,\ldots, N\} \end{align} Its dual problem is (for simplicity only consider the linear classifer) \begin{align} \min\limits_\boldsymbol{a}\;& \frac{1}{2}\sum_{i=1}^N\sum_{j=1}^N y_iy_j \boldsymbol{x}_i^T \boldsymbol{x}_j \alpha_i\alpha_j - \sum_{i=1}^N \alpha_i \\ \;& 0 \le \alpha_i \le C, \quad \forall i \in \{1,\ldots,N\}\\ \;& \sum_{i=1}^N y_i \alpha_i = 0 \end{align}
The KKT condition the author gives is \begin{align} \alpha_i=0 \Leftrightarrow y_i u_i \ge 1 \\ 0<\alpha_i <C \Leftrightarrow y_i u_i = 1 \\ \alpha_i = C \Leftrightarrow y_i u_i \le 1 \end{align} where $u_i = \sum_{j=1}^N y_j \alpha_j \boldsymbol{x}_j^T \boldsymbol{x}_i-b$
Is this KKT condition right? I have some doubts on the author's KKT condition.
- All the three cases contain $y_i u_i = 1$. If it holds, we can get $\alpha_i=0, \alpha_i=C$ and $ 0 < \alpha_i < C$, which are mutually contradictory.
- How can the author get it? The origin KKT condition is
Primal feasibility: $y_i(\boldsymbol{w}^T\boldsymbol{x}_i-b) \ge 1- \xi_i, \xi \ge 0$
Dual feasibility: $0 \le \alpha_i \le C, \sum_{i=1}^N y_i \alpha_i=0$
Complement slackness: $\alpha_i (1-\xi_i -y_i(\boldsymbol{w}^T\boldsymbol{x}_i - b)) = 0$
Gradient of Lagrangian:$\boldsymbol{w}=\sum_{i=1}^N \alpha_iy_i \boldsymbol{x}_i$
Let get started from $\alpha_i=0$. Then $y_i(\boldsymbol{w}^T\boldsymbol{x}_i - b) > 1 - \xi_i$, I think we cannot draw the conclusion that $y_i u_i \ge 1$ unless we know $\xi = 0$.
Correct me if I am wrong.
I have figured it out. The more precise KKT conditions should be \begin{align} \alpha_i = 0\Rightarrow y_i u_i \ge 1 \quad & y_iu_i > 1 \Rightarrow \alpha_i = 0 \\ 0<\alpha_i<C \Rightarrow y_i u_i = 1 \quad & y_iu_i = 1 \Rightarrow 0 \le \alpha_i \le C \\ \alpha_i =C \Rightarrow y_iu_i \le 1 \quad & y_iu_i < 1 \Rightarrow \alpha=C \end{align}
Below is the deduction. I start from the primal problem. The Lagrangian is \begin{equation} L(\boldsymbol{w},b,\boldsymbol{\xi}; \boldsymbol{\alpha},\boldsymbol{\lambda}) = \frac{1}{2} \| \boldsymbol{w} \|^2 + C \sum_{i=1}^N \xi_i + \sum_{i=1}^N \alpha_i (1-\xi_i-y_i(\boldsymbol{w}^T\boldsymbol{x}_i -b)) + \sum_{i=1}^N \lambda_i \xi_i \end{equation} \begin{equation} \inf\limits_{\boldsymbol{\boldsymbol{w},b,\boldsymbol{\xi}}} L = \left\{ \begin{array}{c} \sum_{i=1}^N \alpha_i -\frac{1}{2}\|\sum_{i=1}^N \alpha_iy_i \boldsymbol{x}_i \|^2 &\quad \text{if } \alpha_i + \lambda_i \le C, \sum_{i=1}^N \alpha_i y_i = 0, \alpha_i \ge 0, \lambda_i \ge 0\\ -\infty &\quad \text{otherwise} \end{array} \right. \end{equation} So the dual problem is \begin{align} \min\limits_{\boldsymbol{\alpha}}&\; \frac{1}{2} \sum_{i=1}^N\sum_{j=1}^Ny_i y_j \boldsymbol{x}_i^T \boldsymbol{x}_j \alpha_i \alpha_j- \sum_{i=1}^N \alpha_i \\ &\; \alpha_i \ge 0 \\ &\; \lambda_i \ge 0 \\ &\; \alpha_i + \lambda_i \le C \\ &\; \sum_{i=1}^N \alpha_i y_i = 0 \end{align} The author eliminates $\boldsymbol{\lambda}$ as it does not appear in the object function but this elimination will leave out an important KKT condition.
The complete KKT conditions are:
Then we can prove author's KKT conditions
"$\Rightarrow$"
"$\Leftarrow$"