On an important book of Machine Learning, I've found this proof.
We want to minimize the cost function $J_0(X_0)$ defined by the formula $$J_0(x_0) = \sum_{k=1}^n \|x_0 - x_k \|^2.$$
The solution to this problem is given by $x_0=m$, where $m$ is the sample mean $m = \frac{1}{n}\sum_{k=1}^nx_k$.
Proof. $$ \begin{array}{rcl} J_0(x_0) & = & \sum_{k=1}^n \|(x_0 - m)-(x_k - m) \|^2 \\ & = & \sum_{k=1}^n \|x_0 - m \|^2 -2(x_0-m)^T\sum_{k=1}^n(x_k-m) + \sum_{k=1}^n \|x_k - m \|^2 \\ & = & \sum_{k=1}^n \|x_0 - m \|^2 + \sum_{k=1}^n \|x_k - m \|^2. \end{array}$$
Since $ \sum_{k=1}^n \|x_k - m \|^2 $ is independent of $x_0$, this expression is obviously minimized by $x_0=m$.
I cannot understand this proof. What does assure that $ \sum_{k=1}^n \|x_k - m \|^2 $ is minimized?
I believe that this expression is constant. The only variable anywhere in sight is $x_0$. You are supposed to pick $x_0$ that minimizes $J_0(x_0)$. The only expression involving $x_0$ is nonnegative, so it is minimized if it is zero. Choosing $x_0=m$ does this because $$J_0(x_0) = \underbrace{\sum_{k=1}^n \|x_0 - m \|^2}_{\textrm{zero when }x_0=m} + \underbrace{\sum_{k=1}^n \|x_k - m \|^2}_{\textrm{constant w.r.t. }x_0} $$