Condition on linearly separable datasets

428 Views Asked by At

I am trying to understand linear classification with hyperplanes. So far I understood that for a binary classifier with labels $y_i \in \lbrace -1, 1\rbrace$ points $(x_i, y_i)$ are separable if parameters $a,b$ exist s.t. for different labels $y_i$ the points lie in different halfspaces, i.e. w.l.o.g. $a^Tx_i - b > 0$ for $y_i = 1$ and $a^Tx_i - b < 0$ for $y_i = -1$ which can be rewritten as $$y_i(a^Tx_i - b) > 0$$. However, for the optimal hyperplane (i.e. the one maximizing the distance to each set as defined by the labels) the literature states that $$ y_i(a^Tx_i - b) \geq 1$$ holds. How can I show that whenever I find parameters $a, b$ that satisfy the first condition, there must also exist always (better) parameters satisfying the second equation?

1

There are 1 best solutions below

0
On

The key is that your $x_i$ are indexed by a finite set. If the first inequality is strict, then there is some positive $t_i$ satisfying $y_i (a^T x_i - b) > t_i$. Now, since there are finitely many $t_i$, take the smallest one, call it $s$. Then, modify your old parameters by a factor of $1/s$.

The result does not hold in the infinite case, as one could have $x_i$ in opposite classes which converge to one another. In this case, a linear separation is still possible, but it cannot be expanded to a gap of nonzero thickness as any such gap would miss some points close to the boundary.