I'm attempting to prove the following equality (K-Means Algorithm):
$$ \frac{1}{\lvert C_k \rvert}\sum_{i,i' \in C_k}\sum_{j=1}^P \left( x_{i,j} - x_{i',j} \right)^2 = 2\sum_{i \in C_k}\sum_{j=1}^P \left( x_{i,j} - \bar{x_{k,j}} \right)^2 $$
I have been working with the left side of the equation, and by adding and subtracting $\bar{x_{k,j}}$ (effectively doing nothing), and expanding the quadratic, I end up with the following: $$ \frac{1}{\lvert C_k \rvert}\sum_{i,i' \in C_k}\sum_{j=1}^P \left( x_{i,j} - \bar{x_{k,j}} - x_{i',j} + \bar{x_{k,j}} \right)^2 = \\ \frac{1}{\lvert C_k \rvert}\sum_{i,i' \in C_k}\sum_{j=1}^P \left( (x_{i,j} - \bar{x_{k,j}}) - (x_{i',j} - \bar{x_{k,j}}) \right)^2 = \\ \frac{1}{\lvert C_k \rvert}\sum_{i,i' \in C_k}\sum_{j=1}^P \left( (x_{i,j} - \bar{x_{k,j}})^2 - 2(x_{i,j} - \bar{x_{k,j}})(x_{i',j} - \bar{x_{k,j}}) + (x_{i',j} - \bar{x_{k,j}})^2 \right) = \\ \frac{1}{\lvert C_k \rvert}\sum_{i,i' \in C_k}\sum_{j=1}^P (x_{i,j} - \bar{x_{k,j}})^2 - \frac{1}{\lvert C_k \rvert}\sum_{i,i' \in C_k}\sum_{j=1}^P 2(x_{i',j} - \bar{x_{k,j}})(x_{i,j} - \bar{x_{k,j}}) + \frac{1}{\lvert C_k \rvert}\sum_{i,i' \in C_k}\sum_{j=1}^P (x_{i',j} - \bar{x_{k,j}})^2 = \\ \frac{1}{\lvert C_k \rvert}\sum_{i,i' \in C_k}\sum_{j=1}^P (x_{i,j} - \bar{x_{k,j}})^2 - \frac{2}{\lvert C_k \rvert}\sum_{i,i' \in C_k}\sum_{j=1}^P (x_{i',j} - \bar{x_{k,j}})(x_{i,j} - \bar{x_{k,j}}) + \frac{1}{\lvert C_k \rvert}\sum_{i,i' \in C_k}\sum_{j=1}^P (x_{i',j} - \bar{x_{k,j}})^2 = \\ \frac{\lvert C_k \rvert}{\lvert C_k \rvert}\sum_{i \in C_k}\sum_{j=1}^P (x_{i,j} - \bar{x_{k,j}})^2 - \frac{2}{\lvert C_k \rvert}\sum_{i,i' \in C_k}\sum_{j=1}^P (x_{i',j} - \bar{x_{k,j}})(x_{i,j} - \bar{x_{k,j}}) + \frac{\lvert C_k \rvert}{\lvert C_k \rvert}\sum_{i' \in C_k}\sum_{j=1}^P (x_{i',j} - \bar{x_{k,j}})^2 = \\ \sum_{i \in C_k}\sum_{j=1}^P (x_{i,j} - \bar{x_{k,j}})^2 - \frac{2}{\lvert C_k \rvert}\sum_{i,i' \in C_k}\sum_{j=1}^P (x_{i',j} - \bar{x_{k,j}})(x_{i,j} - \bar{x_{k,j}}) + \sum_{i' \in C_k}\sum_{j=1}^P (x_{i',j} - \bar{x_{k,j}})^2 = \\ 2\sum_{i \in C_k}\sum_{j=1}^P (x_{i,j} - \bar{x_{k,j}})^2 - \frac{2}{\lvert C_k \rvert}\sum_{i,i' \in C_k}\sum_{j=1}^P (x_{i',j} - \bar{x_{k,j}})(x_{i,j} - \bar{x_{k,j}}) = \\ $$
The Questions
I am assuming that $\sum_{i \in C_k} = \lvert C_k \rvert$. Is this a valid assumption? That is the only way that I can see $\lvert C_k \rvert$ moving into the numerator for a few of the terms and dropping one of the i's in the summations.
Am I able to combine the $\sum_{i \in C_k}$ and $\sum_{i' \in C_k}$ terms in the manner I did in the second to last step? My understanding is that whenever $i'$ is used, it is only used to differentiate from $i$ when they are used in a single term. Not sure if this is the correct interpretation.
What are the final steps to the proof? I can't figure out how the last term in the last equation just falls off the map? I emailed the professor of the textbook and he offered "The cross terms cancel, and you get the $\lvert C_kn\rvert$ times a single sum," if that is helpful to anyone.
Any insight greatly appreciated.
P.S. I have looked at this question, however my linear algebra foundation isn't where it needs to be to gain an understanding from this answer.
I'm assuming that $P$ is the dimension of the space, and the $j$-th coordinate of the centroid for cluster $C_k$ is given by $$\bar x_{k,j}=\frac 1 {C_k}\sum_{i\in C_k}x_{i,j}\tag1$$
The answer to both questions $(1)$ and $(2)$ is "yes".
Here is how I'd go about the proof: For $1\leq j \leq P$, we have: $$\begin{split} \frac 1 {2|C_k|}\sum_{i,i' \in C_k}\left( x_{i,j} - x_{i',j} \right)^2 &= \frac 1 {2|C_k|}\sum_{i \in C_k}\sum_{i' \in C_k}\left( x^2_{i,j} +x_{i',j}^2- 2x_{i,j}x_{i',j} \right)\\ &=\frac 1 2\left( \sum_{i\in C_k}x^2_{i,j} \right) + \frac 1 2\left( \sum_{i'\in C_k}x_{i',j}^2\right)- \frac 1 {|C_k|}\left(\sum_{i, \in C_k}\sum_{i' \in C_k}x_{i,j}x_{i',j}\right) \\ &=\left( \sum_{i\in C_k}x^2_{i,j} \right)-\frac 1 {|C_k|}\left(\sum_{i \in C_k}x_{i,j}\right)\left(\sum_{i' \in C_k}x_{i',j}\right)\\ &=\left( \sum_{i\in C_k}x^2_{i,j} \right)-|C_k|\bar x_{k,j}^2 \,\,\,\,\,\,(\text{using }(1))\\ &=\left(\sum_{i \in C_k}x^2_{i,j}\right) - 2|C_k|\bar x^2_{k,j} + |C_k|\bar x_{k,j}^2\,\,\,\,\,\,(\text{OK, that's a neat trick})\\ &= \left(\sum_{i \in C_k}x^2_{i,j}\right) - 2\left(\sum_{i \in C_k}x_{i,j}\right)\bar x_{k,j} + |C_k|\bar x_{k,j}^2\\ &=\sum_{i \in C_k}\left( x^2_{i,j} - 2x_{i,j}\bar x_{k,j} + \bar x_{k,j}^2\right)\\ &=\sum_{i \in C_k}\left( x_{i,j} - \bar x_{k,j} \right)^2 \end{split}$$