Clarification on Termination of K-Means

123 Views Asked by At

Suppose we perform K-means clustering by,

  1. Randomly assign observations to K clusters

  2. Calculate cluster means

  3. Reassign observations to the cluster with the closest cluster mean

And repeat steps 2 and 3. Note that we take a measure of distance as the Euclidean squared distance between observations.

I was looking at a proof on this algorithm terminating but I am struggling with this one line:

Since the algorithm iterates a function whose domain is a finite set, the iteration must eventually enter a cycle.

Could anybody explain this rigorously because I do not understand?

https://stats.stackexchange.com/questions/188087/proof-of-convergence-of-k-means

1

There are 1 best solutions below

0
On BEST ANSWER

Each observation can be in one of the clusters $1,2\ldots,K$. Assuming there are $n$ observations, the total number of possible partitions is $K^n$ (eventually minus the partitions having at least one empty cluster). So by doing $K^n+1$ steps of K-means, the pigeonhole principle guarantees at least one of the partitions will repeat.