I have a symmetric matrix of non-Euclidean distances of size $N$ (say, 500) and I would like to select one cluster of a fixed size $K$ (say, 25), so that it has the smallest average distance within this cluster. What is a good algorithm for doing that given combinatorial complexity of the problem?
Currently I have implemented the following algorithm, which is not perfect in finding the optimum:
- Take $K$ points at random, form the cluster
- Find $K$ points with smallest average distance to the points in the cluster at step 1). Call these $K$ points the new cluster
- Repeat 1) and 2) until selected $K$ points are the same in both steps or until the new cluster has the larger average distance than the old cluster.
Seems like you're re-inventing the k-means algorithm that has been here for a while (Lloyd's algorithm from 1957). Although the problem normally minimizes Euclidean distances, it shouldn't be a problem as long as your distance function is a metric.
k-means++ due to careful seeding provides more stable results in less iterations (original paper).