Given a line defined by the equation $\mathbf{w} \cdot \mathbf{x} + b = 0$, pointed out by the blue line below
I guess the vector $\vec{w}=[w_1,w_2]^T$ could be used to denote the normal of the line.
Given a set of m points $T= \{(x_1^{(i)},x_2^{(i)})\}$ on the plane, $x_i \in \mathbb{R^2}$ and $y_i \in \{-1,1\}$ where $i \in \{1, ...,m \}$,
$(x_1^{(i)},x_2^{(i)})$ denotes the coordinates of the ith point,
$y^{(i)} = 1$ indicates the point belongs to the positive class while $y^{(i)} = -1$ is saying the point belongs to the negative class.
The distance of any point P$(x_1^{(i)},x_2^{(i)})$ from the line is equal to
$d^{(i)} = \dfrac{|w_1x_1^{(i)} + w_2x_2^{(i)} + b|}{\sqrt{w_1^2 + w_2^2}}$
Consider $Dist(w_1,w_2,b)$ a function of $w_1,w_2,b$
Does it make sense to use the derivatives
$ \dfrac{dDist(w_1,w_2,b)}{dw_1} $, $ \dfrac{dDist(w_1,w_2,b)}{dw_2} $, $ \dfrac{dDist(w_1,w_2,b)}{db} $
to find the values for $w_1,w_2,b$ that maximize the total distances of all points to the line which could be used as the separating line for binary classification? If yes, how do I do that?
I guess one of the constraints is that the line has to be somewhere between the largest point and smallest point.
for all m data points, the total distances could be computed by
$ Dist(w_1,w_2,b) = \sum\limits_{i=1}^m y_i \dfrac{w_1x_1^{(i)} + w_2x_2^{(i)} + b}{\sqrt{w_1^2 + w_2^2}} \tag{1} $
as all the points ($y_i = 1$) on the positive side have
$\mathbf{w} \cdot \mathbf{x} + b > 0$
whereas
all the points ($y_j = -1$) on the negative side make
$\mathbf{w} \cdot \mathbf{x} + b < 0$
therefor, proper-classified data points increase equation 1 while misclassified data points decrease the value.

The maximization problem can be simplified considerably assuming
$$ d(w_1,w_2,b) = \sum_{k=1}^nz_k\left(w_1 x_k+w_2y_k+b\right)^2 $$
Here $z_k = \pm 1$. The optimization problem can be stated as
$$ \max_{w_1,w_2,b}d(w_1,w_2,b)\ \ \text{s. t.}\ \ \cases{z_k(w_1 x_k+w_2y_k+b)\ge 0, \ \ \{k=1,\cdots, n\}\\ w_1^2+w_2^2=1\\ } $$
The clear advantage is that now we can use a sequential quadratic problem procedure to determine the optimum point. Follows a MATHEMATICA script which implements this procedure.
NOTE - 1
This procedure works smooth for sets without entangled points. The first plot shows the sets of points in red and blue, and the separating line (dashed black). The second plot, shows the surface $d(w_1,w_2,b^*)$, here $b^*$ is the value obtained in the optimization process, with the restriction points $w_1^2+w_2^2=1$ represented by a thin black curve, and the maximization point $w_1^*, w_2^*$ represented by a thick black dot. Changing the value in
SeedRandom[]we can generate diverse data sets.NOTE - 2
A plot showing only the feasible surface can be obtained as follows