Does it make sense to maximize the total distances of all points to a line to fine the separating line for binary classification?

95 Views Asked by At

Given a line defined by the equation $\mathbf{w} \cdot \mathbf{x} + b = 0$, pointed out by the blue line below

enter image description here

I guess the vector $\vec{w}=[w_1,w_2]^T$ could be used to denote the normal of the line.

Given a set of m points $T= \{(x_1^{(i)},x_2^{(i)})\}$ on the plane, $x_i \in \mathbb{R^2}$ and $y_i \in \{-1,1\}$ where $i \in \{1, ...,m \}$,

$(x_1^{(i)},x_2^{(i)})$ denotes the coordinates of the ith point,

$y^{(i)} = 1$ indicates the point belongs to the positive class while $y^{(i)} = -1$ is saying the point belongs to the negative class.

The distance of any point P$(x_1^{(i)},x_2^{(i)})$ from the line is equal to

$d^{(i)} = \dfrac{|w_1x_1^{(i)} + w_2x_2^{(i)} + b|}{\sqrt{w_1^2 + w_2^2}}$

Consider $Dist(w_1,w_2,b)$ a function of $w_1,w_2,b$

Does it make sense to use the derivatives

$ \dfrac{dDist(w_1,w_2,b)}{dw_1} $, $ \dfrac{dDist(w_1,w_2,b)}{dw_2} $, $ \dfrac{dDist(w_1,w_2,b)}{db} $

to find the values for $w_1,w_2,b$ that maximize the total distances of all points to the line which could be used as the separating line for binary classification? If yes, how do I do that?

I guess one of the constraints is that the line has to be somewhere between the largest point and smallest point.

for all m data points, the total distances could be computed by

$ Dist(w_1,w_2,b) = \sum\limits_{i=1}^m y_i \dfrac{w_1x_1^{(i)} + w_2x_2^{(i)} + b}{\sqrt{w_1^2 + w_2^2}} \tag{1} $

as all the points ($y_i = 1$) on the positive side have

$\mathbf{w} \cdot \mathbf{x} + b > 0$

whereas

all the points ($y_j = -1$) on the negative side make

$\mathbf{w} \cdot \mathbf{x} + b < 0$

therefor, proper-classified data points increase equation 1 while misclassified data points decrease the value.

1

There are 1 best solutions below

5
On BEST ANSWER

The maximization problem can be simplified considerably assuming

$$ d(w_1,w_2,b) = \sum_{k=1}^nz_k\left(w_1 x_k+w_2y_k+b\right)^2 $$

Here $z_k = \pm 1$. The optimization problem can be stated as

$$ \max_{w_1,w_2,b}d(w_1,w_2,b)\ \ \text{s. t.}\ \ \cases{z_k(w_1 x_k+w_2y_k+b)\ge 0, \ \ \{k=1,\cdots, n\}\\ w_1^2+w_2^2=1\\ } $$

The clear advantage is that now we can use a sequential quadratic problem procedure to determine the optimum point. Follows a MATHEMATICA script which implements this procedure.

SeedRandom[1 + 1 + 1 + 1]
gu = {10, 10};
gl = {-10, -10};
nl = 30;
nu = nl;
low = Table[gl + RandomReal[12 {-1, 1}, 2], {k, 1, nl}];
up = Table[gu + RandomReal[12 {-1, 1}, 2], {k, 1, nu}];

obj[w1_, w2_, b_] := (Sum[({w1, w2}.low[[k]] + b)^2, {k, 1, nl}] - Sum[({w1, w2}.up[[k]] + b)^2, {k, 1, nu}])
restrlow = Table[{w1, w2}.low[[k]] + b <= 0, {k, 1, nl}];
restrup = Table[{w1, w2}.up[[k]] + b >=  0, {k, 1, nu}];
restrs = Join[Join[restrlow, restrup], {w1^2 + w2^2 == 1}];
sol = Quiet@NMaximize[{obj[w1, w2, b], restrs}, {w1, w2, b}]
line = w1 x + w2 y + b /. sol[[2]]

gr1 = ListPlot[low, PlotStyle -> {PointSize[Large], Red}];
gr2 = ListPlot[up, PlotStyle -> {PointSize[Large], Blue}];
gr3 = ContourPlot[line == 0, {x, -20, 20}, {y, -20, 20}, ContourStyle -> {Dashed, Black}];
Show[gr1, gr2, gr3, PlotRange -> All, AspectRatio -> 1, Axes -> False]

enter image description here

b0 = b /. sol[[2]];
gr0 = Plot3D[obj[w1, w2, b0], {w1, -1, 1}, {w2, -1, 1}, PlotPoints -> 50, Mesh -> None];
pts = Table[Graphics3D[{Black, PointSize[Small], Point[{Cos[t], Sin[t], obj[Cos[t], Sin[t], b0]}]}], {t, -Pi, Pi, 0.01}];
gr2 = Graphics3D[{Black, PointSize[0.02], Point[{w1, w2, obj[w1, w2, b]}]} /. sol[[2]]];
Show[gr0, pts, gr2]

enter image description here

NOTE - 1

This procedure works smooth for sets without entangled points. The first plot shows the sets of points in red and blue, and the separating line (dashed black). The second plot, shows the surface $d(w_1,w_2,b^*)$, here $b^*$ is the value obtained in the optimization process, with the restriction points $w_1^2+w_2^2=1$ represented by a thin black curve, and the maximization point $w_1^*, w_2^*$ represented by a thick black dot. Changing the value in SeedRandom[] we can generate diverse data sets.

NOTE - 2

A plot showing only the feasible surface can be obtained as follows

restrs0 = Join[restrlow, restrup];
aa = And @@ restrs0 /. {b -> b0};
reg = ImplicitRegion[aa, {w1, w2}];
If[Length[FindInstance[aa, {w1, w2}]] != 0, gr0 = Plot3D[obj[w1, w2, b0], {w1, w2} \[Element] reg, PlotPoints -> 50, Mesh -> None], Print["Feasible region is void"]]
Show[gr0, pts, gr2]

enter image description here