Why can't we just have $\xi = 0$ as the optimal value?

39 Views Asked by At

From Understanding Machine Learning:

In the proof below, why does the choice of $\xi$ being $0$ or $1-(y_1\langle w,x_i\rangle + b )$ matter? Why can't we just have $\xi = 0$ as the optimal value as $(15.4)$ would reduce to the norm term?


enter image description here

1

There are 1 best solutions below

0
On BEST ANSWER

If $(y_1\langle w,x_i\rangle + b ) < 1$ and $\xi_i = 0,$ then $1 - \xi_i = 1$ and therefore $(y_1\langle w,x_i\rangle + b ) < 1 - \xi_i.$

That would contradict the requirement that $(y_1\langle w,x_i\rangle + b ) \geq 1 - \xi_i.$

In the case where $(y_1\langle w,x_i\rangle + b ) < 1,$ what is the least value of $\xi_i$ for which $(y_1\langle w,x_i\rangle + b ) \geq 1 - \xi_i$?

If we add $(y_1\langle w,x_i\rangle + b ) - \xi_i$ to both sides of the inequality $$(y_1\langle w,x_i\rangle + b ) \geq 1 - \xi_i,$$ we get the equivalent inequality $$ \xi_i \geq 1 - (y_1\langle w,x_i\rangle + b ).$$ The least value of $\xi_i$ for which this inequality is true is when the two sides are equal, $$ \xi_i = 1 - (y_1\langle w,x_i\rangle + b ).$$