Question regarding a "Without loss of generality" statement in proof of optimal mean-square estimator.

64 Views Asked by At

The following question is from a statement in a proof (p. 287 in Shiryaev I Third edition) that the conditional expectation of $\eta \mid \xi=x $ is the optimal estimator of $\eta$ in terms of $\xi$.

Given a pair of random variables $( \eta, \xi )$, we may try to find a Borel function $\phi$ such that $\phi(\xi)$ estimates $\eta$. The optimal such estimator is the one that minimizes the mean-squared error. There is a theorem that states that if $\mathrm{E} \eta^2 < \infty$ then $$ \phi(x) = \mathrm E (\eta \mid \xi = x ) $$ can be taken as the optimal estimator.

In the first line of the proof it is stated that: "Without loss of generality we may consider only estimators $\phi(\xi)$ for which $\mathrm \phi^2(\xi) < \infty$." I have a hard time figuring out what is meant with this?

I can see that, if $\phi ^2 (\xi ) = \infty $, then $$ \mathrm E \left[ \eta - \phi (\xi) \right] ^2 = \mathrm E \eta^2 - 2 \mathrm E \eta \phi(\xi) + \mathrm E \phi^2(\xi) = \infty $$ (assuming $\mathrm E \phi(\xi) < \infty$ and that we have defined addition with extended real numbers $r < \infty$ such that $r + \infty = \infty$), and if there is no better estimator then perhaps the term "optimal" does not give much information?

I have included a screenshot of the whole section below. enter image description here

1

There are 1 best solutions below

0
On BEST ANSWER

For any estimator with $\mathbb{E}\left[\varphi^{2}\left(\xi\right)\right]=\infty$, you have shown that $\mathbb{E}\left[\left(\eta-\varphi\left(\xi\right)\right)^{2}\right]=\infty$.

Now consider a naive estimator $\varphi_{0}\left(\xi\right)\equiv0$. We have $\mathbb{E}\left[\left(\eta-\varphi_{0}\left(\xi\right)\right)^{2}\right]=\mathbb{E}\left[\eta^{2}\right]<\infty$ which is guaranteed by assumption of Theorem 1. Therefore, $\varphi_{0}\left(\xi\right)$ is always better than $\varphi\left(\xi\right)$ in terms of MSE.

In other words, for any estimator $\varphi\left(\xi\right)\not\in L_{2}$, at least the naive estimator $\varphi_{0}\left(\xi\right)\in L_{2}$ dominates it, hence it cannot be optimal. That's why we only need to consider estimators in $L_{2}$.