Formal proof that mean minimize squared error function

Question

Formal proof that mean minimize squared error function

16.7k Views Asked by Bumbble Comm At 29 Mar 2026 - 5:41

On an important book of Machine Learning, I've found this proof.

We want to minimize the cost function $J_0(X_0)$ defined by the formula $$J_0(x_0) = \sum_{k=1}^n \|x_0 - x_k \|^2.$$

The solution to this problem is given by $x_0=m$, where $m$ is the sample mean $m = \frac{1}{n}\sum_{k=1}^nx_k$.

Proof. $$ \begin{array}{rcl} J_0(x_0) & = & \sum_{k=1}^n \|(x_0 - m)-(x_k - m) \|^2 \\ & = & \sum_{k=1}^n \|x_0 - m \|^2 -2(x_0-m)^T\sum_{k=1}^n(x_k-m) + \sum_{k=1}^n \|x_k - m \|^2 \\ & = & \sum_{k=1}^n \|x_0 - m \|^2 + \sum_{k=1}^n \|x_k - m \|^2. \end{array}$$

Since $ \sum_{k=1}^n \|x_k - m \|^2 $ is independent of $x_0$, this expression is obviously minimized by $x_0=m$.

I cannot understand this proof. What does assure that $ \sum_{k=1}^n \|x_k - m \|^2 $ is minimized?

Original Q&A

There are 5 best solutions below

Bumbble Comm On 10 Oct 2014 - 8:25

Once you understand the computation (the key relation is $\sum (x_k - m) = 0$) you are left with $$ J(x_0) = \sum_{k=1}^n |x_0 - m|^2 + \sum_{k=1}^n |x_k - m|^2 \ge \sum_{k=1}^n |x_k - m|^2 = J(m) $$

This proves that the minimum is for $x_0 = m$.

Bumbble Comm On 10 Oct 2014 - 8:41

Let be $c\in\mathbb{R}$ a constant and $$f(x) = g(x)+ c\text{.}$$ Define

$$m_f = \min \{f(x)\mid x\in \mathcal{D}\}\quad\text{and}\quad g(x_0) = m_g:= \{g(x)\mid x\in \mathcal{D}\}\text{,}$$ where $\mathcal{D}$ is the domain of $f$ and $g$.

We claim that $$f(x_0) = m_f\Leftrightarrow g(x_0) = m_g \text{.}$$

Proof:

$(\Rightarrow)$ if there is a $x_0\in\mathcal{D}$ for which $f(x_0) = m_f$ and $g(x_0)\neq m_g$ then there is a $x_1\in\mathcal{D}$ such that $m_g = g(x_1) < g(x_0)$. It follows that $$f(x_1) = g(x_1) + c < g(x_0) + c = f(x_0) = m_f\text{,}$$ hence $f(x_1)<m_f$, which leads us to contradiction.
$(\Leftarrow)$ Analogically.

In your case, you have $$f(x) = J_0(x),\;\; g(x) =\sum_{k=1}^n \|x - m \|^2\quad\text{and}\quad c= \sum_{k=1}^n \|x_k - m \|^2$$ since the latest sum is a constant indeed, because $x_1,\dots , x_n$ are given (hence $m = \sum_{i\geq 1} x_i$ is given also). Obviously $g(x)\geq 0$ and it attains minimum for those $x_0$'s for which $g(x_0) = 0$. There is only one such $x_0$ and it equals $m$.

Bumbble Comm On 10 Oct 2014 - 8:47

Assume $X_0$ is a space of "values" to optimize, and so the values are constant. Note that $\sum_{k=1}^n \|x_k− m\|^2$ is constant because it does not depend of $x_0$ ($x_k$ and $m$ are calculated from $X_0$). So in the last line of the proof you have $J_0(x_0) = \sum_{k=1}^n \|x_0 - m \|^2 + \sum_{k=1}^n \|x_k - m \|^2$, i.e., $$J_0(x_0) = \sum_{k=1}^n \|x_0 - m \|^2 + C,$$ for some non-negative integer $C \ge 0$. Thus, you need to minimize $\sum_{k=1}^n \|x_0 - m \|^2$. Clearly, it is minimized when $x_0 = m$.

Edit 1. The term $2(x_0-m)^T\sum_{k=1}^n(x_k-m)$ vanishes because $$\sum_{k=1}^n (x_k - m) = \sum_{k=1}^n x_k - \sum_{k=1}^n m = \sum_{k=1}^n x_k - m \times n.$$ Since $m = \frac{1}{n} \sum_{k=1}^n x_k$ we have $$ \sum_{k=1}^n x_k = m \times n. $$ Thus $$ \begin{array}{rcl} 2(x_0-m)^T\sum_{k=1}^n(x_k-m) & = & 2(x_0-m)^T(m \times n - m \times n) \\ & = & 2(x_0-m)^T \times 0 \\ & = & 0. \end{array} $$

Edit 2. If $x_0 = v$ and $v \ne m$ we obtain the same. $$ \begin{array}{rcl} J_0(x_0) & = & \sum_{k=1}^n \|(x_0 - v)-(x_k - v) \|^2 \\ & = & \sum_{k=1}^n \|x_0 - v \|^2 -2(x_0-v)^T\sum_{k=1}^n(x_k-v) + \sum_{k=1}^n \|x_k - v \|^2. \end{array} $$ The cross term vanishes when $x_0 = v$, so \begin{array}{rcl} J_0(x_0) & = & \sum_{k=1}^n \|x_0 - x_0 \|^2 -2(x_0-x_0)^T\sum_{k=1}^n(x_k-v) + \sum_{k=1}^n \|x_k - x_0 \|^2 \\ & = & 0 + 0 + \sum_{k=1}^n \|x_k - x_0 \|^2 \\ & = & \sum_{k=1}^n \|x_k - x_0 \|^2 \\ & = & \sum_{k=1}^n \|x_0 - x_k \|^2 . \end{array}

Your proof is a proof of existence. This type of proofs can be done picking some value $m$ and proving that it satisfies the claim, but it does not prove the uniqueness, so one can imagine that there is another value which satisfies the claim. Your proof does not prove the uniqueness, maybe because this is "clearly".

Aditional. Simplifying your problem, I will assume that $X_0$ is a scalar space (a collection of numbers), to give a alternative proof.

If we have $J_0(x_0) = \sum_{k=1}^n (x_0 - x_k)^2$, to minimize we derive respect to $x_0$, thus

$$ \begin{array}{rcl} \frac{\partial}{\partial x_0} J_0(x_0) & = & \frac{\partial}{\partial x_0} \sum_{k=1}^n (x_0 - x_k)^2 \\ & = & 2 \times \sum_{k=1}^n (x_0 - x_k) \\ & = & 2 \times \left(\sum_{k=1}^n x_0 - \sum_{k=1}^n x_k\right) \\ & = & 2 \times \left(n x_0 - \sum_{k=1}^n x_k\right). \end{array} $$

Now to minize $J_0(x_0)$, we need that $\partial J_0(x_0) /\partial x_0 = 0$. Thus \begin{array}{rcl} 2 \times \left(n x_0 - \sum_{k=1}^n x_k\right) & = & 0 \\ n x_0 & = & \sum_{k=1}^n x_k \\ x_0 & = & \frac{1}{n} \sum_{k=1}^n x_k \\ x_0 & = & m. \qquad\qquad \Box \end{array}

To obtain tha same result for vectors you need to use the gradient and proceed similarly. Surely, you will learn later about normal equations.

Bumbble Comm On 21 Nov 2017 - 8:09

The squared error function is convex and differentiable. Hence it has a unique minimizer $\mu$ and its gradient exists. To obtain that minimum, we take the gradient of $J$ at $x$:

$\nabla J(x) = 2\sum_k (x-x_k)$

From the necessary conditions of optimality follows that the gradient at the unique minimizer $\mu$ is zero. Thus,

$\nabla J(\mu) = 2\sum_k (\mu-x_k) = 0$

Solving the above equation for $\mu$ gives the arithmetic mean:

$\quad 2\sum_k (\mu-x_k) = 0$

$\Rightarrow \sum_k \mu- \sum_k x_k = 0$

$\Rightarrow n \cdot \mu = \sum_k x_k$

$\Rightarrow \mu = \frac{1}{n}\sum_k x_k$

**Bumbble Comm** · Accepted Answer

I believe that this expression is constant. The only variable anywhere in sight is $x_0$. You are supposed to pick $x_0$ that minimizes $J_0(x_0)$. The only expression involving $x_0$ is nonnegative, so it is minimized if it is zero. Choosing $x_0=m$ does this because $$J_0(x_0) = \underbrace{\sum_{k=1}^n \|x_0 - m \|^2}_{\textrm{zero when }x_0=m} + \underbrace{\sum_{k=1}^n \|x_k - m \|^2}_{\textrm{constant w.r.t. }x_0} $$

Formal proof that mean minimize squared error function

There are 5 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in STATISTICS

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions