Devision by norm vector to maximize margin in SVMs

356 Views Asked by At

Could someone give me an intuition why we devide the difference of the two contraints (1) $w^tx_++b=1$ and (2) $w^tx_-+b=-1$ by the length or the norm of the vector $w$ to get the margin of the hyperplane $w^tx+b=0$?

As far as I understand the concept, subtract the constraint from (2) from te constaint (1):

$$w^tx_++b-w^tx_-+b=1-(-1)$$

which gives us

$$w^t(x_+-x_-) =2$$

Now comes the part that I don't clearly understand. We devide the $w^t(x_+-x_-) =2$ by $\lVert w\rVert$ to get:

$$\cfrac{w^t(x_+-x_-)}{\lVert w\rVert} =\cfrac{2}{\lVert w\rVert}$$

Which tells us that the margin from hyperplane $w^tx+b=0$ to the constraints (1) and (2) is $\cfrac{2}{\lVert w\rVert}$.

What exactly does the norm of the the vector $w$ in this case tell us? $w$ is the slope of the hyperplane but what does the length of the slope express?

1

There are 1 best solutions below

0
On BEST ANSWER

You're trying to maximize the distance between two parallel hyperplanes. To find the distance between two hyperplanes, you take a point $y$ from one hyperplane and find the shortest distance to the other hyperplane, using any point $x$ that is in the latter hyperplane. So we have $$d=||y-x||=\left|\left | \frac{(y-x)\cdot \omega}{\omega\cdot\omega} \omega \right| \right|=\frac{|y\cdot\omega-x\cdot\omega|}{||\omega||}$$ where we're taking the projection of the vector connecting $x$ and $y$ onto the $\omega$ direction and measuring the projection's magnitude. Using the formulas we have for each hyperplane, the above formula simplifies to the aforementioned $\frac{2}{||\omega||}$.

What is weird here, is when we do projections, we divide out by the size of $\omega$ to compensate for $\omega$'s contribution to the inner product in the numerator. That is, when we measure the component of a vector in the direction of a second vector, we don't want the size of the second vector affecting the measurement. However, in this particular case, the inner product in the numerator simplifies to a constant so when we change the size of $\omega$ we actually affect the distance between the two planes. Namely, because the two planes are offset from the origin by $b\pm 1$, the only way to affect the distance between the hyperplanes is if you minimize $||\omega||$ with the constraints that $$class(x_i)(\omega\cdot x_i+b)\geq 1\mbox{ for } 1\leq i\leq n,$$ where $class(x_i)\in \{-1,1\}$ denotes to which class $x_i$ belongs.