Understanding the equation for margin in linear classification

2.5k Views Asked by At

In linear classification, the margin is the distance that the closest point is to the separating hyperplane. It is useful because not all hyperplanes are equal, so taking the hyperplane with the largest margin is an intuitive way to select the best hyperplane. But I do not understand this equation:

$$ \epsilon = \max_{|W| \leq 1} \min_i {w^Tx_i \cdot y_i} $$

where $\epsilon$ is the margin, $w$ is a hyperplane in the set of possible hyperplanes $W$, $x_i$ is sample $i$, $y_i \in \{-1, 1\}$ is the label for sample $i$.

Can someone explain what the equation is doing? I get what we want to do conceptually, but do not understand this.

1

There are 1 best solutions below

2
On BEST ANSWER

We want $w^Tx_i$ (prediction) and $y_i$ (truth) to share the same sign, that is we desire that $$w^Tx_i y_i \geq 0$$

Suppose this condition can be satisfied, which hyperplane $w^Tx=0$ should we choose?

Suppose the hyperplane is given to us, $w^Tx=0$ and $\left\| w\right\|=1$, let's compute the distance of a point $x_iy_i$ from the hyperplane.

Note that $w$ is orthogonal to the hyperplane and $\left\| w\right\|=1$, hence the distance of the point $x_iy_i$ from the hyperplane can be compute as $w^Tx_iy_i$ (i.e. projecting to the direction that is orthogonal to the hyperplane)

Given a training data set, the smallest distance from the hyperplane is then

$$\min_i w^Tx_iy_i$$

We want to choose $w$ that maximizes this quantity, hence the problem becomes

$$\max_{\left\|w\right\|=1}\min_i w^Tx_iy_i$$

It is easy to see that

$$\max_{\left\|w\right\|=1}\min_i w^Tx_iy_i=\max_{\left\|w\right\| \leq 1}\min_i w^Tx_iy_i$$