K-Means equality proof

124 Views Asked by At

Is there any geometrical and or short proof for the equality

${\displaystyle {\underset { }{ }}\sum _{i=1}^{k}\sum _{\mathbf {x} \in S_{i}}\left\|\mathbf {x} -{\boldsymbol {\mu}}_{i}\right\|^{2}}={\displaystyle \sum _{i=1}^{k}\,{\frac {1}{2|S_{i}|}}\,\sum _{\mathbf {x} ,\mathbf {y} \in S_{i}}\left\|\mathbf {x} -\mathbf {y} \right\|^{2}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (*)$

One way to prove this is to take into consideration the equality

${\displaystyle \sum _{\mathbf {x} \in S_{i}}\left\|\mathbf {x} -{\boldsymbol {\mu }}_{i}\right\|^{2}=\sum _{\mathbf {x} \neq \mathbf {y} \in S_{i}}(\mathbf {x} -{\boldsymbol {\mu }}_{i})({\boldsymbol {\mu }}_{i}-\mathbf {y} )}$

where I cannot find any geometrical meaning.

Could someone help me to prove (*) in a geometrical way?

1

There are 1 best solutions below

0
On BEST ANSWER

This is an identity that holds for each cluster $S_i$ separately. Thus let $S$ be a set of $N$ points in $\mathbb{R}^n$ and let $\mu = \frac{1}{m} \sum_i x_i$ be their mean. Then since $\sum_i x_i = m \mu$ $$ \sum_i \|x_i - \mu\|^2 = \sum_i \left( \|x_i\|^2 - 2 x_i^T \mu + \|\mu\|^2\right) = \sum_i \|x_i\|^2 - 2 |S| \|\mu\|^2 + |S| \|\mu\|^2 $$ which equals $\sum_i \|x_i\|^2 - |S|\|\mu\|^2$. On the other hand $$ \sum_{i,j} \|x_i - x_j \|^2 = \sum_{i,j} \left(\|x_i\|^2 + \|x_j\|^2 - 2 x_i^Tx_j \right) = 2 |S|\sum_i \|x\|^2 - 2 |S|^2 \|\mu\|^2 $$ and the identity follows. There is no obvious geometric interpretation that I can think of.