Why "hinge" loss is equivalent to 0-1 loss in SVM?

8.3k Views Asked by At

I'm reading a book named The Elements of Statistical Learning by Hastie et al. In $\S 12.3.2$ it introduced the SVM as a penalization method:

With $f(x)=h(x)^T \beta+\beta_0 $, the solution of the optimization problem $$\min_{\beta_0,\beta} \sum_{i=1}^N [1-y_i f(x_i)]_+ +\frac{\lambda}{2}\|\beta\|^2 $$ with $\lambda=\frac{1}{C}$, is the same as that for $$\min_{\beta_0,\beta} \frac{1}{2}\|\beta\|^2 +C \sum_{i=1}^N \xi_i$$ $$\text{subject to } \xi_i \geq 0,y_i f(x_i)\geq 1-\xi_i ~ \forall i, $$

Could anyone kindly give me some hint why they are equivalent, and what's the benefit of introducing the "hinge" loss?

Thanks a lot!