Selecting a cluster based on minimum average distance

450 Views Asked by At

I have a symmetric matrix of non-Euclidean distances of size $N$ (say, 500) and I would like to select one cluster of a fixed size $K$ (say, 25), so that it has the smallest average distance within this cluster. What is a good algorithm for doing that given combinatorial complexity of the problem?

Currently I have implemented the following algorithm, which is not perfect in finding the optimum:

  1. Take $K$ points at random, form the cluster
  2. Find $K$ points with smallest average distance to the points in the cluster at step 1). Call these $K$ points the new cluster
  3. Repeat 1) and 2) until selected $K$ points are the same in both steps or until the new cluster has the larger average distance than the old cluster.
1

There are 1 best solutions below

1
On BEST ANSWER

Seems like you're re-inventing the k-means algorithm that has been here for a while (Lloyd's algorithm from 1957). Although the problem normally minimizes Euclidean distances, it shouldn't be a problem as long as your distance function is a metric.

k-means++ due to careful seeding provides more stable results in less iterations (original paper).