Why are the hyperplanes which define the margin in SVM defined as $w^Tx - b = \pm1$?

32 Views Asked by At

I am trying to understand the way Support Vector Machines work but one point of confusion I have is why the two hyperplanes which define the margin are defined as:

$$w^Tx-b=1$$

for the hyperplane closest to the positive class and

$$w^Tx-b=-1$$

for the hyperplane closest to the negative class. I got these definitions from Wikipedia but they are basically the same everywhere.

I get that the hyperplane dividing the two classes is $w^Tx-b=0$. While making predictions, we use this hyperplane to decide if the new sample $\bar{x}$ belongs to the class $+1$ or to the class $-1$. If we have $w^T \bar{x} -b \ge 0$ then $\bar{x}$ belongs to the $+1$ class, else $\bar{x}$ belongs to the $-1$ class. This is clear.

What I don't get is where those $+1$ and $-1$ come from in the definitions of the hyperplane which make up the margin? Why do we use $w^Tx-b=1$ and $w^Tx-b=-1$ as the two hyperplanes which define the margin? Why $+1$ and $-1$? Where do they come from? Why not $+100$ and $-100$? Why not other numbers? What if, in our case, having ones does not work? For example, look at this:

enter image description here

Above we have two classes of points and we are trying to create a classifier using SVM. It is visible that the support vectors will be $A(4, 3)$ and $X(6, 5)$. Also, it's clear that the line dividing the two sets of data will be

enter image description here

Now, according to everything I have read so far about SVM the two hyperplanes which make up the margin should be $d1: x + y - 9 = 1$ and $d2: x + y - 9 = -1$. But that simply looks wrong. Those two lines would look like the following:

enter image description here

And that is NOT the largest possible margin! We would instead need $d1:x+y-9=2$ and $d2:x+y-9=-2$:

enter image description here

This whole exercise only made me more confused. What am I missing? What did I misunderstand?