I am looking for a way to classify a set of points $\vec x$ such that each class contains roughly the same number of points. Currently I start with the first point $\vec x_0$ use this as reference point for the first class and the next point get eithers assigned to that class, if $|\vec x - \vec x_0| <eps$ or I take it as the reference point for a new class. Similar all the other points are assigned to a class, when the distance to the reference point is minimal or I take them as a new reference point, if the smallest distance is bigger than the threshold $eps$. However, in this way I get few classes with lots of points and many classes with few points. How can I classify the points such that I have roughly the same number of points per class?
I am pretty sure that there exists some algorithm, but I dont really know what terms to search for.
PS: wasnt sure how to tag the question, didnt find any better tag...
PPS: Some clarification...
- the "points" are vectors with ~50 components.
- the classification should be such that points within each class have minimum distance from each other
For what it's worth:
Start with a balanced assignment of the points to the classes, for instance by sorting on the first component.
(Or trying all components in turn and keeping the one that leads to the best compactness. Another option is to start with a K-NN assignment - which won't necessarily be balanced -, keep the class centers and assign them the nearest neighbors in a greedy way.)
Then iteratively improve by
(Or even simpler, rate all possible swaps in terms of distance reduction and apply the best of them.)
This will be costly and may lead to dead-ends, but hopefully better than nothing.