How can I prove that the mean of a probability distribution is a value that minimizes the variance of the distribution?

144 Views Asked by At

According to Page 87, John_K_Kruschke-Doing_Bayesian_Data_Analysis-EN.pdf 2nd Edition, the author says that the mean of a distribution is a value that minimizes the variance of a probability distribution, for example, a normal distribution. The following is what is mentioned in the page:

"It turns out that the value of $ M $ that minimizes $ \int p(x)(x−M)^2 dx = E[X] $. In other words, the mean of the distribution is the value that minimizes the expected squared deviation. In this way, the mean is a central tendency of the distribution."

I have read the paragraph and kind of understood what the author is trying to say but I wonder how this can be written mathematically using the above equation. Hope to hear some explanations.

And why are we trying to use the mean to minimize the variance?

1

There are 1 best solutions below

2
On BEST ANSWER

$$ \frac{\partial}{\partial M}\int p(x)(x-M)^2dx = 0,\\ \int p(x)\frac{\partial (x-M)^2}{\partial M}dx = 0,\\ \int p(x)(2(M-x))dx = 0,\\ 2\int p(x)Mdx = 2\int p(x) xdx,\\ M\int p(x)dx = \int p(x) xdx,\\ M = \int p(x) xdx,\\ $$

Edit Explanation for non-math people

Imagine a simple distribution: $x=0$ with $p=0.2$ and $x=1$ with $p=0.8$. Let's take $M=0.5$ first, than the variance:

$$ \sum_x p(x)(x-M)^2 = 0.2\times 0.5^2 + 0.8\times 0.5^2 = 0.25 $$

What if we have increased $M$ by a tiny amount $0.001$? How much will it decrease the variance?

$$ \sum_x p(x)(x-M-0.001)^2 = \sum_x p(x)\left((x-M)^2-2(x-M)\times 0.001 + 0.001^2)\right) = \\ \sum_x p(x)(x-M)^2 + 2\times 0.001\times\sum_x p(x)(M-x) + 0.001^2\sum_x p(x)\\ = 0.25 +  2\times0.001\times(0.2\times0.5 - 0.8\times 0.5) + 10^{-6} $$

Thus, neglecting $10^{-6}$, which is way smaller than the second term, we cay that variance will decrease by the second term ($-0.0006$). We can keep increasing $M$ and the variance will decrease until the second term is no longer negative. This happens when this term is exactly zero. Hence $\int p(x)(x-M)dx$.

What we did here is called differentiation. And the reason why we did this, is because if the derivative (the slope) exists at points of minimum, it is zero.