GOAL: I want create a simple linear programming model that can be used to classify new data based off of a found hyperplane. In this particular case I have two classes and I'm given a data set in which the data is already classified.
ATTEMPT:
So I have that $x'$, $x''$ are given data and my objective is to find a "line" separating the data.
When I say I desire a "line" separating the data, what I actually mean is a vector $$a=[a_0, a_1, ..., a_n]$$ such that $$\sum_{i=1}^{n} a_i x_i = a_0$$ with the property that one class is on one side of the line and the other class is on the other side.
So far I have come up with the following thought process..
$x'$ is on one side if $$\sum_{i=1}^{n} a_{i}x'_{i} > a_{0}$$
$x''$ is on the other side if $$\sum_{i=1}^{n} a_{i}x''_{i} < a_{0}$$
We can scale up or down such an algebraic expression; the hyperplane stays the same.
So, $$\sum a_{i}x'_{i} \geq a_{0} + 1$$ and $$\sum a_{i}x''_{i} \leq a_{0} - 1$$
For each point $x'$ of class A, introduce $y'_{i} \geq 0$, where $$y'_{i} \geq a_{0} + 1 - \sum a_{i}x'_{i}$$
Similarly for each point $x''$ of B, introduce $y''_{i} \geq 0$, where
$$y''_{i} \geq \sum a_{i}x''_{i} - a_{0}+1$$
Then recall $x'$, $x''$ are data given and we are looking for $a$, $y'$, $y''$
Now I try to minimize each of $y'$, and $y''$
I obtain the linear program
\begin{array}{ll@{}ll} \text{minimize} & \displaystyle \sum_{i \in A} & y'_{i} + \displaystyle \sum_{i \in B} y''_{i} \\ \text{subject to} & y'_{i} &\geq a_{0} + 1 - \sum_{j=1}^{n} a_{j}x'_{j} &\forall i \in A\\ & y''_{i} &\geq \sum_{j=1}^{n} a_{j} x''_{j} - a_{0} + 1 &\forall i \in B\\ & a_{i} &\in \mathbb{R} &\forall i\in A \cup B\\ & y'_{i} &\geq 0 &\forall i \in A\\ & y''_{i} &\geq 0 &\forall i \in B\\ \end{array}
Once I have the vector $a$, to classify a new point $z$ I just compute $$\sum a_{i}z_{i} - a_{0}\text{,}$$ which is is either positive or negative, indicating $z$ belongs to cluster A or cluster B.
QUESTIONS:
- I'm struggling to determine the correctness of this model. Can this be confirmed?
- It's clear that to avoid misclassification, a better hyperplane would maximize the distance from the planes to each class (like a support vector machine). How can my current model be extended to do this?
The proposed model works as expected.
To avoid chance of misclassification by finding a better hyperpland that maximizes the distance from the planes to each class (like a SVM) can be done in the following way if we assume the data is linearly separable:
$minimize: \sum_{j=1}^{p}( \vec{w}_{j}^{+} + \vec{w}_{j}^{-}) $
Constraints:
Where $|\vec{w}|$ is the distance from the hyperplane to a class, and $w = w^+ - w^-$
Note this is equivalent to the quadratic support vector machine approach but instead of using the L2 norm for distance we use the L1 norm