Decision surface in linear classification

564 Views Asked by At

I have several question regarding the following definition of a linear hyperplane for classification:

We define our classifier $F$ as follows:

$$F(x) = \text{sign}(\langle w,x\rangle +b) \in \{1,-1\}$$

where

$$\text{sign}(z) = \begin{cases} 1&z \geq 0\\ -1&z < 0 \end{cases}$$

My questions:

  1. How comes the threshold can be assumed to be $0$? Do we have to set constraints on weight vector $w$ and bias $b$ s.t. this is fulfilled? I don't see why for different problem sets it would not be any number $\in R$.
  2. As far as I know, $\langle w,x\rangle+b =0$ defines a plane only if $w$ is a normal vector...again, how comes that $w$ is normal? Is this again a constraint we set during optimization?
  3. In our lecture notes, it is mentioned that $|\langle w,x\rangle+b|$ is the distance of the vector $x$ from the hyperplane $\langle w,x\rangle+b = 0$. How so?

Many thanks

1

There are 1 best solutions below

2
On
  1. Because of the bias term. The $w$ forms the orientation of the hyperplane (it determines the normal vector to the plane), while the bias $b$ forms the "position" of the plane. Consider a simple case in 1D: every point greater than 10 is +1; others are -1. Then the bias simply moves the threshold to 10. In every (linearly separable) case, one can let the threshold be 0, and let the $b$ make up for it. (Alternatively, one can change the threshold and force $b$ to be 0, but it would be the same).

  2. It's the other way around; look at the definition of hyperplane. Any equation $\vec{w}\cdot\vec{x}={b}$ forms a hyperplane. The classical form for a plane is $\vec{n}\cdot(\vec{x}-\vec{b})=0$, where $\vec{n}$ is the normal to the plane and $\vec{b}$ is the intercept. So $w$ determines the normal, by definition.

  3. This only seems to be correct if $w$ is a unit vector. The unsigned distance from a point $p$ to a plane $w\cdot x+b=0$ is (see e.g. here): $$ d(p) = \frac{|w\cdot p + b|}{||w||_2} $$ This is not assumed in general (though it sometimes is). See here for instance.

Also, consider looking at descriptions of the perceptron, which is what you are concerned with.