Is the Random Variable's Expectation the optimal Solution for the Mean Squared Error?

93 Views Asked by At

Lets assume we compute the Mean Squared Error between an estimated, but fixed variable $\hat{x}$ and a dataset ${x_1, x_2,...,x_n}$, sampled from a non-gaussian distributed random variable $\mathcal{X}$:

$$ \frac{1}{N} \sum^N_i (\hat{x} - x)^2. $$

My intuition is, that the variable $\hat{x}$ that minimizes this error is the expectation of the distribution of $\mathcal{X}$, is this correct? And if yes, how to proof it?

1

There are 1 best solutions below

7
On BEST ANSWER

This is exact. Suppose your samples have the same law than a random variable $X$.

Let $a\in\mathbb{R}$ be any number. Then $$ \sum_{i=1}^N (a - x_i)^2 = \sum_{i=1}^N \left[(a - \mathbb{E}[X])^2 + 2(a - \mathbb{E}[X])(\mathbb{E}[X] - x_i) + (\mathbb{E}[X] - x_i)^2\right]. $$ Now, the first term in the sum is just positive. That means that the above is greater or equal than $$ 2(a - \mathbb{E}[X])\sum_{i=1}^N (\mathbb{E}[X] - x_i) + \sum_{i=1}^N (\mathbb{E}[X] - x_i)^2. $$ The second term is just the means square error for the expectation and the first term is zero, because evaluate the mean of $\mathbb{E}[X] - X$ (multiplied by $N$ to be precise). All in all, we get $$ \dfrac{1}{N} \sum_{i=1}^N (a - x_i)^2 \geq \dfrac{1}{N} \sum_{i=1}^N (\mathbb{E}[X] - x_i)^2 $$ with equality if and only if $a = \mathbb{E}[X]$.