I am reading up on K-means clustering. I understand the procedure to an extent (I think!), but I am looking to understand the mathematical formulation better.
Given $m $ data points {$ x^1, x^2, ....., x^m $} $\in \mathbb{R^n} $
Find $k $ cluster centers {$ c^1, c^2, ....., c^k $} $\in \mathbb{R^n} $
And assign each data point i to one cluster
$\pi(i) \in $ {$ 1,2, ......,k$}
Such that, averaged square distance is minimized from each data point to respective cluster center.
$ min_{c, \pi} \frac{1}{m} \sum_{j=1}^{m} \Vert x^i - c^{\pi(i)} \Vert^2$
Now, decide cluster membership of each data point $x^i$ by assigning it to nearest cluster.
$ \pi(i) = argmin_{j=1,.....,k} \Vert x^i - c^j \Vert^2$
Now what I don't understand is the following:
To update the cluster center we use:
$c^j = \frac{1}{ \vert \{i: \pi(i)=j \} \vert } \sum_{i: \pi(i)=j}{x^i}$
I understand that we are averaging all the data points and selecting the new average as the cluster center, but how does this notation explain that? I would greatly appreciate if someone explained that to me. Thanks!
The $\pi$ function tells you which cluster a data-point belongs to, so $\{i\colon \pi(i)=j\}$ denotes the (index) set of all points which belong to the $j$-th cluster. In particular
$$ c^j = \Big( \underbrace{\vert \{i: \pi(i)=j \} \vert}_{\substack{\text{number of data-points}\\\text{belonging to cluster j}} }\Big)^{-1} \underbrace{\sum_{i: \pi(i)=j}}_{{\substack{\text{sum over all data-points}\\\text{belonging to cluster j}}}}{x^i} $$
is the average of all data points beloning to the $j$-th cluster.