Consider the two classification problem, our classifier is a hyperplane $w^T x+b=0$, where $w$ is the weight vector, $x$ is the input vector and $b$ is the bias. we have input $x_i\in R^d,i=1,\cdots,n$, class label $y_i\in \{\pm1\}$.Then we need to maximize the margin in order to gain a best hyperplane to separate our data-points well.
The article "Successive overrelaxation for support vector machines " by Mangasarian, O.L (1999 page 2) proposed two SVM models
- (SVM1) $\min_{w,b,\xi}~\frac{1}{2}w^Tw+C\sum_{i=1}^n\xi_i,\quad s.t.~ y_i(w^Tx_i+b)\geqslant 1-\xi_i,\xi_i\geqslant0,i=1,\cdots,n$
- (SVM2) $\min_{w,b,\xi}~\frac{1}{2}(w^Tw+b^2)+C\sum_{i=1}^n\xi_i,\quad s.t.~y_i(w^Tx_i+b)\geqslant 1-\xi_i,\xi_i\geqslant0,i=1,\cdots,n$
He show us that under some conditions, the two problem are equvialent without proof. I get stuck with the proof.
This skill can help us eliminate the equation $\sum\limits_{i=1}^n\alpha_iy_i=0$ in the dual problem, it has been applied in many articles, such as Hsieh C J, Chang K W, Lin C J, et al(2008 page 1).
Thanks for your help.