Why this statistical equation is true?

Question

Why this statistical equation is true?

70 Views Asked by Bumbble Comm At 31 Mar 2026 - 2:23

Pardon my ignorance. I also don't know how to properly show the equation. So I attached URL of latex...

Why this equation is true for any real number $k$, when $\overline{X}$ is average of individual Xs.

$$\sum_{i=1}^{n}(X_i-k)^2 = \sum_{i=1}^{n}(X_i-\overline{X})^2 + n(k-\overline{X})^2$$

The above equation will have minimum vale when $k=\overline{X}$. What's meaning of that, statistically?

Original Q&A

There are 3 best solutions below

Bumbble Comm On 21 Mar 2016 - 2:51

There is an "add and subtract" trick to be used here (as in many other problems):

$$(X_i-k)^2=(X_i-\overline{X}+\overline{X}-k)^2=(X_i-\overline{X})^2+2(X_i-\overline{X})(\overline{X}-k)+(\overline{X}-k)^2.$$

Now $(\overline{X}-k)^2$ doesn't depend on $i$, so summing up $n$ of those just multiplies by $n$. This reduces the problem of proving your equation to proving $\sum_{i=1}^n (X_i-\overline{X})(\overline{X}-k)=0$. Can you do that?

Statistically, we are often interested in finding a single number to use as a summary of a collection of measurements. This number should be close to the measurements themselves in some sense, but the meaning of "close" depends on what exactly you are trying to do. This theorem says that if you sum the squares of the differences between the measurements and your summary number, the best summary number is the average. Note that this squaring gives large weight to outliers, which correctly suggests that the average is sensitive to outliers.

By contrast, the median is a minimizer of the sum of absolute distances. These don't give the same increased weight to outliers, and indeed the median is minimally sensitive to outliers.

Bumbble Comm On 21 Mar 2016 - 2:52

We have that \begin{align*} \sum_{i=1}^n(X_i-k)^2 &=\sum_{i=1}^n(X_i-\overline X+\overline X-k)^2\\ &=\sum_{i=1}^n\left[(X_i-\overline X)^2+2(X_i-\overline X)(\overline X-k)+(\overline X-k)^2\right]\\ &=\sum_{i=1}^n(X_i-\overline X)^2+2(\overline X-k)\sum_{i=1}^n(X_i-\overline X)+n(\overline X-k)^2 \end{align*} and $$ \sum_{i=1}^nX_i-n\overline X=\sum_{i=1}^nX_i-n\cdot\frac1n\sum_{i=1}^nX_i=0. $$

**Bumbble Comm** · Accepted Answer

Define:

$$f(k)=\sum_{i=1}^{n}(X_i-k)^2 = \sum_{i}X_i^2 - 2k\sum_i{X_i} + nk^2$$

But $\sum_i X_i = n\overline{X}$.

So we have:

$$\begin{align}f(k)=\sum_i X_i^2-2nk\overline{X}+nk^2 \end{align}$$

When $k=\overline{X}$, then $$f(\overline X)=\sum_i X_i^2-2n\overline{X}^2+n\overline{X}^2=\sum_i X_i^2 -n\overline{X}^2$$

So $$f(k)-f(\overline X) = n\overline{X}^2-2nk\overline{X}+nk^2=n(k-\overline X)^2$$

Which is the result you want.

You could also just take the derivative of the original function:

$$f'(k)=2\sum(k-X_i)$$

and set it to zero to find that the minimum is when $k=\overline X$. Then, since $f$ is a quadratic with leading coefficient $n$ and mininum value $f(\overline X)$, we get:

$$f(k)=f(\overline X) + n(k-\overline X)^2$$

You can think of $f$ as an error function. It measures the square of the distance of the point $(X_1,X_2,\dots,X_n)$ to the the point $(k,k,\dots,k)$.

Let's say you measure a quantity $n$ times, with some amount of uncertainty in your measuring technique, yielding values $X_1,X_2,\dots,X_n$. You want to find the "best fit" value, $k$, for your measure. "Best fit" depends on how you measure the error, but the above error function is particularly nice and geometric, and this theorem gives the "best fit" $k$ as the average of your measurements.

Why this statistical equation is true?

There are 3 best solutions below

Related Questions in STATISTICS

Related Questions in AVERAGE

Trending Questions

Popular # Hahtags

Popular Questions