Does it make sense to maximize the total distances of all points to a line to fine the separating line for binary classification?

Question

Does it make sense to maximize the total distances of all points to a line to fine the separating line for binary classification?

95 Views Asked by Bumbble Comm At 09 Apr 2026 - 3:00

Given a line defined by the equation $\mathbf{w} \cdot \mathbf{x} + b = 0$, pointed out by the blue line below

I guess the vector $\vec{w}=[w_1,w_2]^T$ could be used to denote the normal of the line.

Given a set of m points $T= \{(x_1^{(i)},x_2^{(i)})\}$ on the plane, $x_i \in \mathbb{R^2}$ and $y_i \in \{-1,1\}$ where $i \in \{1, ...,m \}$,

$(x_1^{(i)},x_2^{(i)})$ denotes the coordinates of the ith point,

$y^{(i)} = 1$ indicates the point belongs to the positive class while $y^{(i)} = -1$ is saying the point belongs to the negative class.

The distance of any point P$(x_1^{(i)},x_2^{(i)})$ from the line is equal to

$d^{(i)} = \dfrac{|w_1x_1^{(i)} + w_2x_2^{(i)} + b|}{\sqrt{w_1^2 + w_2^2}}$

Consider $Dist(w_1,w_2,b)$ a function of $w_1,w_2,b$

Does it make sense to use the derivatives

$ \dfrac{dDist(w_1,w_2,b)}{dw_1} $, $ \dfrac{dDist(w_1,w_2,b)}{dw_2} $, $ \dfrac{dDist(w_1,w_2,b)}{db} $

to find the values for $w_1,w_2,b$ that maximize the total distances of all points to the line which could be used as the separating line for binary classification? If yes, how do I do that?

I guess one of the constraints is that the line has to be somewhere between the largest point and smallest point.

for all m data points, the total distances could be computed by

$ Dist(w_1,w_2,b) = \sum\limits_{i=1}^m y_i \dfrac{w_1x_1^{(i)} + w_2x_2^{(i)} + b}{\sqrt{w_1^2 + w_2^2}} \tag{1} $

as all the points ($y_i = 1$) on the positive side have

$\mathbf{w} \cdot \mathbf{x} + b > 0$

whereas

all the points ($y_j = -1$) on the negative side make

$\mathbf{w} \cdot \mathbf{x} + b < 0$

therefor, proper-classified data points increase equation 1 while misclassified data points decrease the value.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

The maximization problem can be simplified considerably assuming

$$ d(w_1,w_2,b) = \sum_{k=1}^nz_k\left(w_1 x_k+w_2y_k+b\right)^2 $$

Here $z_k = \pm 1$. The optimization problem can be stated as

$$ \max_{w_1,w_2,b}d(w_1,w_2,b)\ \ \text{s. t.}\ \ \cases{z_k(w_1 x_k+w_2y_k+b)\ge 0, \ \ \{k=1,\cdots, n\}\\ w_1^2+w_2^2=1\\ } $$

The clear advantage is that now we can use a sequential quadratic problem procedure to determine the optimum point. Follows a MATHEMATICA script which implements this procedure.

SeedRandom[1 + 1 + 1 + 1]
gu = {10, 10};
gl = {-10, -10};
nl = 30;
nu = nl;
low = Table[gl + RandomReal[12 {-1, 1}, 2], {k, 1, nl}];
up = Table[gu + RandomReal[12 {-1, 1}, 2], {k, 1, nu}];

obj[w1_, w2_, b_] := (Sum[({w1, w2}.low[[k]] + b)^2, {k, 1, nl}] - Sum[({w1, w2}.up[[k]] + b)^2, {k, 1, nu}])
restrlow = Table[{w1, w2}.low[[k]] + b <= 0, {k, 1, nl}];
restrup = Table[{w1, w2}.up[[k]] + b >=  0, {k, 1, nu}];
restrs = Join[Join[restrlow, restrup], {w1^2 + w2^2 == 1}];
sol = Quiet@NMaximize[{obj[w1, w2, b], restrs}, {w1, w2, b}]
line = w1 x + w2 y + b /. sol[[2]]

gr1 = ListPlot[low, PlotStyle -> {PointSize[Large], Red}];
gr2 = ListPlot[up, PlotStyle -> {PointSize[Large], Blue}];
gr3 = ContourPlot[line == 0, {x, -20, 20}, {y, -20, 20}, ContourStyle -> {Dashed, Black}];
Show[gr1, gr2, gr3, PlotRange -> All, AspectRatio -> 1, Axes -> False]

b0 = b /. sol[[2]];
gr0 = Plot3D[obj[w1, w2, b0], {w1, -1, 1}, {w2, -1, 1}, PlotPoints -> 50, Mesh -> None];
pts = Table[Graphics3D[{Black, PointSize[Small], Point[{Cos[t], Sin[t], obj[Cos[t], Sin[t], b0]}]}], {t, -Pi, Pi, 0.01}];
gr2 = Graphics3D[{Black, PointSize[0.02], Point[{w1, w2, obj[w1, w2, b]}]} /. sol[[2]]];
Show[gr0, pts, gr2]

NOTE - 1

This procedure works smooth for sets without entangled points. The first plot shows the sets of points in red and blue, and the separating line (dashed black). The second plot, shows the surface $d(w_1,w_2,b^*)$, here $b^*$ is the value obtained in the optimization process, with the restriction points $w_1^2+w_2^2=1$ represented by a thin black curve, and the maximization point $w_1^*, w_2^*$ represented by a thick black dot. Changing the value in SeedRandom[] we can generate diverse data sets.

NOTE - 2

A plot showing only the feasible surface can be obtained as follows

restrs0 = Join[restrlow, restrup];
aa = And @@ restrs0 /. {b -> b0};
reg = ImplicitRegion[aa, {w1, w2}];
If[Length[FindInstance[aa, {w1, w2}]] != 0, gr0 = Plot3D[obj[w1, w2, b0], {w1, w2} \[Element] reg, PlotPoints -> 50, Mesh -> None], Print["Feasible region is void"]]
Show[gr0, pts, gr2]

Does it make sense to maximize the total distances of all points to a line to fine the separating line for binary classification?

There are 1 best solutions below

Related Questions in DERIVATIVES

Related Questions in OPTIMIZATION

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions