KKT condition of linearly inseparable Support Vector Machine (SVM)

858 Views Asked by At

In the paper Sequential Minimal Optimization:A Fast Algorithm for Training Support Vector Machines, the optimization problem for linearly inseparable SVM is \begin{align} \min\limits_{\boldsymbol{w},b,\boldsymbol{\xi}} &\; \frac{1}{2} \| \boldsymbol{w} \|_2^2 + C \sum_{i=1}^N \xi_i \\ s.t. &\;y_i(\boldsymbol{w}^T\boldsymbol{x}_i -b) \ge 1 - \xi_i, \quad \forall i \in \{1,\ldots, N\} \\ &\; \xi_i \ge 0, \quad \forall i \in \{1,\ldots, N\} \end{align} Its dual problem is (for simplicity only consider the linear classifer) \begin{align} \min\limits_\boldsymbol{a}\;& \frac{1}{2}\sum_{i=1}^N\sum_{j=1}^N y_iy_j \boldsymbol{x}_i^T \boldsymbol{x}_j \alpha_i\alpha_j - \sum_{i=1}^N \alpha_i \\ \;& 0 \le \alpha_i \le C, \quad \forall i \in \{1,\ldots,N\}\\ \;& \sum_{i=1}^N y_i \alpha_i = 0 \end{align}

The KKT condition the author gives is \begin{align} \alpha_i=0 \Leftrightarrow y_i u_i \ge 1 \\ 0<\alpha_i <C \Leftrightarrow y_i u_i = 1 \\ \alpha_i = C \Leftrightarrow y_i u_i \le 1 \end{align} where $u_i = \sum_{j=1}^N y_j \alpha_j \boldsymbol{x}_j^T \boldsymbol{x}_i-b$

Is this KKT condition right? I have some doubts on the author's KKT condition.

  1. All the three cases contain $y_i u_i = 1$. If it holds, we can get $\alpha_i=0, \alpha_i=C$ and $ 0 < \alpha_i < C$, which are mutually contradictory.
  2. How can the author get it? The origin KKT condition is

Primal feasibility: $y_i(\boldsymbol{w}^T\boldsymbol{x}_i-b) \ge 1- \xi_i, \xi \ge 0$

Dual feasibility: $0 \le \alpha_i \le C, \sum_{i=1}^N y_i \alpha_i=0$

Complement slackness: $\alpha_i (1-\xi_i -y_i(\boldsymbol{w}^T\boldsymbol{x}_i - b)) = 0$

Gradient of Lagrangian:$\boldsymbol{w}=\sum_{i=1}^N \alpha_iy_i \boldsymbol{x}_i$

Let get started from $\alpha_i=0$. Then $y_i(\boldsymbol{w}^T\boldsymbol{x}_i - b) > 1 - \xi_i$, I think we cannot draw the conclusion that $y_i u_i \ge 1$ unless we know $\xi = 0$.

Correct me if I am wrong.

1

There are 1 best solutions below

0
On

I have figured it out. The more precise KKT conditions should be \begin{align} \alpha_i = 0\Rightarrow y_i u_i \ge 1 \quad & y_iu_i > 1 \Rightarrow \alpha_i = 0 \\ 0<\alpha_i<C \Rightarrow y_i u_i = 1 \quad & y_iu_i = 1 \Rightarrow 0 \le \alpha_i \le C \\ \alpha_i =C \Rightarrow y_iu_i \le 1 \quad & y_iu_i < 1 \Rightarrow \alpha=C \end{align}

Below is the deduction. I start from the primal problem. The Lagrangian is \begin{equation} L(\boldsymbol{w},b,\boldsymbol{\xi}; \boldsymbol{\alpha},\boldsymbol{\lambda}) = \frac{1}{2} \| \boldsymbol{w} \|^2 + C \sum_{i=1}^N \xi_i + \sum_{i=1}^N \alpha_i (1-\xi_i-y_i(\boldsymbol{w}^T\boldsymbol{x}_i -b)) + \sum_{i=1}^N \lambda_i \xi_i \end{equation} \begin{equation} \inf\limits_{\boldsymbol{\boldsymbol{w},b,\boldsymbol{\xi}}} L = \left\{ \begin{array}{c} \sum_{i=1}^N \alpha_i -\frac{1}{2}\|\sum_{i=1}^N \alpha_iy_i \boldsymbol{x}_i \|^2 &\quad \text{if } \alpha_i + \lambda_i \le C, \sum_{i=1}^N \alpha_i y_i = 0, \alpha_i \ge 0, \lambda_i \ge 0\\ -\infty &\quad \text{otherwise} \end{array} \right. \end{equation} So the dual problem is \begin{align} \min\limits_{\boldsymbol{\alpha}}&\; \frac{1}{2} \sum_{i=1}^N\sum_{j=1}^Ny_i y_j \boldsymbol{x}_i^T \boldsymbol{x}_j \alpha_i \alpha_j- \sum_{i=1}^N \alpha_i \\ &\; \alpha_i \ge 0 \\ &\; \lambda_i \ge 0 \\ &\; \alpha_i + \lambda_i \le C \\ &\; \sum_{i=1}^N \alpha_i y_i = 0 \end{align} The author eliminates $\boldsymbol{\lambda}$ as it does not appear in the object function but this elimination will leave out an important KKT condition.

The complete KKT conditions are:

  • Primal feasibility: $y_i(\boldsymbol{w}^T\boldsymbol{x}_i - b) \ge 1 - \xi_i, \xi_i \ge 0$
  • Dual feasibility: $\alpha_i \ge 0, \lambda_i \ge 0, \alpha_i + \lambda_i \le C$
  • Complementary slackness: $\alpha_i(1-\xi_i-y_i(\boldsymbol{w}^T\boldsymbol{x}_i-b)) =0, \lambda_i \xi_i=0$
  • Gradient of Lagragian:$\lambda_i + \alpha_i = C, \sum_{i=1}^N \alpha_i y_i = 0,\boldsymbol{w}=\sum_{i=1}^N \alpha_i y_i \boldsymbol{x}_i$

Then we can prove author's KKT conditions

"$\Rightarrow$"

  1. If $\alpha_i=0$, $\lambda_i = C \Rightarrow \xi_i=0$. $y_i(\boldsymbol{w}^T\boldsymbol{x}_i - b) \ge 1 - \xi_i = 1$ i.e. $y_iu_i \ge 1$.
  2. If $0 < \alpha_i <C$, $y_i(\boldsymbol{w}^T\boldsymbol{x}_i -b) = 1- \xi_i$ and $0 < \lambda_i < C \Rightarrow \xi_i =0$, so $y_iu_i = 1$
  3. If $\alpha_i = C$, $y_i(\boldsymbol{w}^T\boldsymbol{x}_i-b)=1-\xi_i \le 1$ i.e. $y_iu_i \le 1$.

"$\Leftarrow$"

  1. $y_i(\boldsymbol{w}^T\boldsymbol{x}_i-b) > 1\Rightarrow 1 - \xi_i -y_i(\boldsymbol{w}^T\boldsymbol{x}_i-b) < 0 \Rightarrow \alpha_i = 0$
  2. $y_i(\boldsymbol{w}^T\boldsymbol{x}_i-b) = 1\Rightarrow \xi_i=0 \Rightarrow \alpha_i$ is unconstrained i.e. $0 \le \alpha_i \le C$.
  3. $y_i(\boldsymbol{w}^T\boldsymbol{x}_i-b) < 1\Rightarrow \xi > 0 \Rightarrow \lambda_i = 0 \Rightarrow \alpha_i=C$.