How does this prove that the solution is optimal?

144 Views Asked by At

From Understanding Machine Learning:

In the red box below: How does this show that $(\hat w, \hat b)$ is an optimal solution? This seems to just show that for each $i$, it's larger than the smallest error in $(15.1)$.

Can someone explain how this is optimal?


enter image description here

1

There are 1 best solutions below

0
On BEST ANSWER

Hard-SVM considers all $(\mathbf w,b)$ such that $f(\mathbf w,b):=\min_i y_i(\langle \mathbf w,\mathbf x_i\rangle +b)\ge 1$ and picks one, $\mathbf w_0$, that minimizes $\|\mathbf w\|$. This is then scaled down to make $\|\hat{\mathbf w}\|=1$.

By homogenity (i.e., $f(\lambda\mathbf w,\lambda b)=\lambda f(\mathbf w,b)$), minimizing $\|w\|$ under the constraint $f(\mathbf w,b)\ge1$ is "essentially the same" (more precisely, dual to) as maximizing $f(\mathbf w,b)$ under the constraint $\|w\|=1$. Compare with the fact that "the shape with given perimeter that maximizes area is the circle" being equivalent to "the shape with given area that minimizes perimeter is the circle".