From Understanding Machine Learning:
In the red box below: How does this show that $(\hat w, \hat b)$ is an optimal solution? This seems to just show that for each $i$, it's larger than the smallest error in $(15.1)$.
Can someone explain how this is optimal?

Hard-SVM considers all $(\mathbf w,b)$ such that $f(\mathbf w,b):=\min_i y_i(\langle \mathbf w,\mathbf x_i\rangle +b)\ge 1$ and picks one, $\mathbf w_0$, that minimizes $\|\mathbf w\|$. This is then scaled down to make $\|\hat{\mathbf w}\|=1$.
By homogenity (i.e., $f(\lambda\mathbf w,\lambda b)=\lambda f(\mathbf w,b)$), minimizing $\|w\|$ under the constraint $f(\mathbf w,b)\ge1$ is "essentially the same" (more precisely, dual to) as maximizing $f(\mathbf w,b)$ under the constraint $\|w\|=1$. Compare with the fact that "the shape with given perimeter that maximizes area is the circle" being equivalent to "the shape with given area that minimizes perimeter is the circle".