I'm going through the derivation of the hard margin SVM and I'm a little confused as to why there's a $1$ in the constraint as opposed to a $0$.
Consider the canonical form of the hard margin SVM constraint
$y_i (w^Tx_i - b) \geq 1, \forall i$
where $x_i$ is the training data, the label is $y_i \in \{ +1, -1\}$ and $w$, $b$ are parameters of a hyperplane.
I'm confused about where the $\geq 1$ comes from, because we could've achieved a similar result with
$y_i (w^Tx_i - b) \geq 0, \forall I$
similar to a classic perceptron.
After some reading, I'v found a few reasons but I don't quite understand the intuition behind them.
From this SE article, the constraint was actually a new variable $\gamma$ which determines the size of the margin, but we can just divide both sides by $\gamma$ and still solve for the same hyperplane. This explanation makes algebraic sense but it doesn't make geometric sense to me.
Without the $\geq 1$ inequality, the optimizer can solve for any $(\alpha w, \alpha b)$ scale factor of the parameters of the hyperplane and still yield the same hyperplane, which can lead to instability. Therefore, we want to put a constraint on $||w||_2$, and somehow setting the inequality to $1$ accomplishes this.
I like the reasoning behind (2) a lot more because it introduces a real problem of having infinitely many solutions and solves it by somehow constraining it to have one unique solution (should one exist), but I don't understand the math behind it.
Thanks for any clarifications!