From Understanding Machine Learning:
In the proof below, why does the choice of $\xi$ being $0$ or $1-(y_1\langle w,x_i\rangle + b )$ matter? Why can't we just have $\xi = 0$ as the optimal value as $(15.4)$ would reduce to the norm term?
From Understanding Machine Learning:
In the proof below, why does the choice of $\xi$ being $0$ or $1-(y_1\langle w,x_i\rangle + b )$ matter? Why can't we just have $\xi = 0$ as the optimal value as $(15.4)$ would reduce to the norm term?
If $(y_1\langle w,x_i\rangle + b ) < 1$ and $\xi_i = 0,$ then $1 - \xi_i = 1$ and therefore $(y_1\langle w,x_i\rangle + b ) < 1 - \xi_i.$
That would contradict the requirement that $(y_1\langle w,x_i\rangle + b ) \geq 1 - \xi_i.$
In the case where $(y_1\langle w,x_i\rangle + b ) < 1,$ what is the least value of $\xi_i$ for which $(y_1\langle w,x_i\rangle + b ) \geq 1 - \xi_i$?
If we add $(y_1\langle w,x_i\rangle + b ) - \xi_i$ to both sides of the inequality $$(y_1\langle w,x_i\rangle + b ) \geq 1 - \xi_i,$$ we get the equivalent inequality $$ \xi_i \geq 1 - (y_1\langle w,x_i\rangle + b ).$$ The least value of $\xi_i$ for which this inequality is true is when the two sides are equal, $$ \xi_i = 1 - (y_1\langle w,x_i\rangle + b ).$$