Suppose we perform K-means clustering by,
Randomly assign observations to K clusters
Calculate cluster means
Reassign observations to the cluster with the closest cluster mean
And repeat steps 2 and 3. Note that we take a measure of distance as the Euclidean squared distance between observations.
I was looking at a proof on this algorithm terminating but I am struggling with this one line:
Since the algorithm iterates a function whose domain is a finite set, the iteration must eventually enter a cycle.
Could anybody explain this rigorously because I do not understand?
https://stats.stackexchange.com/questions/188087/proof-of-convergence-of-k-means
Each observation can be in one of the clusters $1,2\ldots,K$. Assuming there are $n$ observations, the total number of possible partitions is $K^n$ (eventually minus the partitions having at least one empty cluster). So by doing $K^n+1$ steps of K-means, the pigeonhole principle guarantees at least one of the partitions will repeat.