Uniform Effect of K-means Clustering

160 Views Asked by At

In the following link is discussed the uniform Effect of K-means Clustering:

https://www.springer.com/cda/content/document/cda_downloaddocument/9783642298066-c2.pdf?SGWID=0-0-45-1338325-p174318763

He used the following equation:

$ \frac{\displaystyle2 d(C_1,C_2)}{\displaystyle n_1n_2}=\frac{\displaystyle d(C_1,C_1)}{\displaystyle n_1^2}+\frac{\displaystyle d(C_2,C_2)}{\displaystyle n_2^2}+2||m_1−m_2||^2.$

where

$ d(C_1,C_2)=\sum_{x_i\in C_1}\sum_{x_j\in C_2}||x_i−x_j||^2$ with $ |C_1|=n_1$ and $|C_2 |=n_2$.

I tried to prove this, but I do not get this. He uses the fact that

$ d(C_1,C_1)=2(n_1-1)\sum_{i=1}^{n_1} ||x_i||^2 -4 \sum_{1\leq i < j \leq n_1}\langle x_i,x_j\rangle.$

$ d(C_2,C_2)=2(n_2-1)\sum_{i=1}^{n_2} ||y_i||^2 -4 \sum_{1\leq i < j \leq n_2}\langle y_i,y_j\rangle.$

$ d(C_1,C_2)=2n_2\sum_{i=1}^{n_1} ||x_i||^2+2n_1\sum_{i=1}^{n_2} ||y_i||^2 -4 \sum_{1\leq i\leq n_1}\sum_{1\leq j\leq n_2}\langle x_i,y_j\rangle.$

And furthermore with

$||m_1−m_2||^2=\langle \sum_{i=1}^{n_1}x_i/n_1-\sum_{j=1}^{n_1}y_j/n_2,\sum_{i=1}^{n_1}x_i/n_1-\sum_{j=1}^{n_1}y_j/n_2 \rangle = \frac{1}{\displaystyle n_1^2}\sum_{i=1}^{n_1} ||x_i||^2 + \frac{2}{\displaystyle n_1^2}\sum_{1\leq i < j\leq n_1} \langle x_i, x_j\rangle +\frac{1}{\displaystyle n_2^2}\sum_{i=1}^{n_2} ||y_i||^2 + \frac{2}{\displaystyle n_2^2}\sum_{1\leq i < j\leq n_2} \langle y_i, y_j\rangle - \frac{2}{n_1n_2} \sum_{1\leq i\leq n_1}\sum_{1\leq j\leq n_2}\langle x_i,y_j\rangle, $

I obtain that

$ \frac{\displaystyle d(C_1,C_2)}{\displaystyle n_1n_2}=\frac{\displaystyle d(C_1,C_1)}{\displaystyle n_1^2}+\frac{\displaystyle d(C_2,C_2)}{\displaystyle n_2^2}+2||m_1−m_2||^2.$

Where is my error?

1

There are 1 best solutions below

0
On

I think the problem is in the computation of $d(C_1, C_2)$: $$ \begin{align} d(C_1, C_2) &= \sum_{i =1}^{n_1}\sum_{j =1}^{n_2}|| x_i - y_j||^2\\ &= \sum_{i =1}^{n_1}\sum_{j =1}^{n_2}\langle x_i - y_j, x_i - y_j\rangle \\ &= \sum_{i =1}^{n_1}\sum_{j =1}^{n_2}\left(\langle x_i, x_i\rangle+\langle x_i , - y_j\rangle+\langle - y_j, x_i\rangle+\langle - y_j, - y_j\rangle \right)\\ &= \sum_{i =1}^{n_1}\sum_{j =1}^{n_2}\langle x_i, x_i\rangle-2\sum_{i =1}^{n_1}\sum_{j =1}^{n_2}\langle x_i , y_j\rangle+\sum_{i =1}^{n_1}\sum_{j =1}^{n_2}\langle y_j, y_j\rangle \\ &= \sum_{i =1}^{n_1}n_2\langle x_i, x_i\rangle-2\sum_{i =1}^{n_1}\sum_{j =1}^{n_2}\langle x_i , y_j\rangle+n_1\sum_{j =1}^{n_2}\langle y_j, y_j\rangle \\ &= n_2\sum_{i =1}^{n_1}|| x_i||^2+n_1\sum_{j =1}^{n_2}|| y_j||^2-2\sum_{i =1}^{n_1}\sum_{j =1}^{n_2}\langle x_i , y_j\rangle \\ \end{align} $$

This is exactly half the value you obtained, and substituted in the rest of your computations leads to the correct result.