I am trying to understand if it is possible to use mixed integer linear programming (MILP) in order to perform a basic clustering operation to a dataset $D$. I know there exists standard algorithms for it but I am particularly interested in MILP.
Assume one has the dataset $D = D_1 \cup D_2$ with: $$ D_1 = [1,2,2,3,4,3,1,0,2,2,1]\\ D_2 = [7,6,5,8,7,7,4,7,9,8,6] $$ Is there some MILP algorithm that can cluster the concatenated and shuffled data $SD$, with $S$ some permutation acting on $D$, which can cluster each element of $D$ to either $D_1$ or $D_2$?
Let continuous variables $x_1$ and $x_2$ represent the centers of the two clusters. Let binary variable $y_i$ indicate whether element index $i$ (with value $d_i$) is in $D_1$. Now minimize $$\sum_i |d_i - c_1|y_i + \sum_i |d_i - c_2|(1-y_i),$$ which you can linearize as follows. Introduce nonnegative continuous variables $z_{i,1}$ and $z_{i,2}$ to represent the summands, and enforce $y_i = 1 \implies z_{i,1} \ge |d_i-c_1|$ and $y_i = 0 \implies z_{i,1} \ge |d_i-c_2|$. The problem is to minimize $$\sum_i z_{i,1} + \sum_i z_{i,2}$$ subject to linear big-M constraints \begin{align} (d_i-c_1) - z_{i,1} &\le M_1(1-y_i) \\ -(d_i-c_1) - z_{i,1} &\le M_1(1-y_i) \\ (d_i-c_2) - z_{i,2} &\le M_2 y_i \\ -(d_i-c_2) - z_{i,2} &\le M_2 y_i \\ \end{align}