Let $X$ be some measure space (esp probability space), $Y$ some metric space and we consider $g:X\to Y$ as an estimator for $f:X\to Y$. It seems convenient to consider an estimation as 'good' in case for any small $\epsilon > 0$ the measure $ \mu(\{d(f,g) > \epsilon\}) $ is small. Using Markov's Inequality we have for any $h:\mathbb{R}\to \mathbb{R}$ that is positive and non decreasing on $\mathbb{R}_{\geq 0}$ that:
$$ \mu(\{d(f,g) > \epsilon\}) \quad \leq \quad \frac{1}{h(\epsilon)} \int_X h(d(f,g))\; d\mu $$
Thus, there is a lot of choices for upper bounds choosing different $h$ (x^n, exp, ..). Nevertheless, it seems that mostly only the Mean Squared Error of $f$ and $g$ (where $h=x^2$) is considered and we try to find a $g$ that minimizes the Mean Squared Error. What makes it so special compared to other choices? Why don't we use another $h$ and find an estimator $g$ minimizing $\int_X h(d(f,g))\; d\mu$ ?
Of course we use other bounds when applying Markov's inequality! Let me translate out of the language of measure theory and into the language of probability. If we choose $h_t(x) = e^{tx}$ for $ > 0$, then we get $$P(X \geq a) = P(e^{tX} \geq e^{ta})\leq M(t) e^{-ta}$$ Where $M(t) = E(e^{tX})$ is the moment generating function of $X$, provided it exists. Since this holds for all $t$ on which $M$ exists we can take an infimum: $$P(X \geq a) \leq \inf_t M(t) e^{-ta}$$ to yield the so-called Chernoff bounds.
I can think of a few reasons why people often care about mean squared error: