Maximum number of local minima in k-means

15 Views Asked by At

Suppose $\mathcal{Z} = \{z_1, \dots, z_n\}$ is the set of points in $d$-dimensional Euclidean space. The aim is to partition the dataset into $(K\leq n)$ distinct clusters $R_1,\dots, R_K$ where $R_i\cap R_j = \emptyset$, for $i\neq j$, and $\cup_{j=1}^{K} R_j = \mathcal{Z}$ such that some empirical loss function is minimized. A frequently used loss function is given as $$ \sum_{j=1}^{K}\sum_{\mathbf{z}\in R_j}\|\mathbf{z} - \mathbf{c}_{j}\|^2, $$ where $\mathbf{c}_j$ is a cluster center or centroid of $R_j$. The k-means algorithm starts with a fixed centroid vector and i) assigns each datapoint to the cluster whose centroid is closest, ii) using the assignment in the previous step, finds new centroids by averaging the datapoints in each cluster. The repetition of i and ii guarantees convergence to the local optimum. Is there any way I can bound the maximum number of the local minima in the clustering problem given $K,n$ and the dimension of centroid vectors $d$?