What makes the Mean Squared Error so special compared to other upper bounds?

75 Views Asked by At

Let $X$ be some measure space (esp probability space), $Y$ some metric space and we consider $g:X\to Y$ as an estimator for $f:X\to Y$. It seems convenient to consider an estimation as 'good' in case for any small $\epsilon > 0$ the measure $ \mu(\{d(f,g) > \epsilon\}) $ is small. Using Markov's Inequality we have for any $h:\mathbb{R}\to \mathbb{R}$ that is positive and non decreasing on $\mathbb{R}_{\geq 0}$ that:

$$ \mu(\{d(f,g) > \epsilon\}) \quad \leq \quad \frac{1}{h(\epsilon)} \int_X h(d(f,g))\; d\mu $$

Thus, there is a lot of choices for upper bounds choosing different $h$ (x^n, exp, ..). Nevertheless, it seems that mostly only the Mean Squared Error of $f$ and $g$ (where $h=x^2$) is considered and we try to find a $g$ that minimizes the Mean Squared Error. What makes it so special compared to other choices? Why don't we use another $h$ and find an estimator $g$ minimizing $\int_X h(d(f,g))\; d\mu$ ?

1

There are 1 best solutions below

2
On

Of course we use other bounds when applying Markov's inequality! Let me translate out of the language of measure theory and into the language of probability. If we choose $h_t(x) = e^{tx}$ for $ > 0$, then we get $$P(X \geq a) = P(e^{tX} \geq e^{ta})\leq M(t) e^{-ta}$$ Where $M(t) = E(e^{tX})$ is the moment generating function of $X$, provided it exists. Since this holds for all $t$ on which $M$ exists we can take an infimum: $$P(X \geq a) \leq \inf_t M(t) e^{-ta}$$ to yield the so-called Chernoff bounds.

I can think of a few reasons why people often care about mean squared error:

  1. It is easy to compute and optimise;
  2. In estimation theory the MSE of an unbiased estimator is its variance. The theory of uniform minimum variance unbiased estimation, which teaches us how to extract the best unbiased estimators, is pretty much fully understood.