Confusion regarding limiting variances (Casella, Statistical Inference, 2nd edition, example 10.1.8)

229 Views Asked by At

In the book Statistical Inference (George Casella 2nd ed.), page 470, there is an example:

$\bar{X}_n$ is the mean of $n$ iid observations, and E$X=\mu$, $\operatorname{Var}X=\sigma^2$. "If we take $T_n=1/\bar{X}_n$, we find that the variance is $\operatorname{Var}T_n=\infty$, so the limit of the variances is infinity." Why is the limit of the variance infinity? I can go as far as

$$\operatorname{Var}\frac 1{\bar{X}_n}=\operatorname{E}_{\bar{X}_n}\left[\left(\frac1{\bar{X}_n}-\operatorname{E} \frac 1{\bar X_n} \right)^2 \mid \mu\right] $$

what's next? I know that $\lim_{n\to\infty}\operatorname{Var}{\bar X_n}=0$. However, $\lim_{n\to\infty}1/\operatorname{Var}{\bar X_n}\not=\lim_{n\to\infty}\operatorname{Var}1/\bar X_n$. How can we show the variance approaches to infinity for sufficiently large $n$?

Thanks!

2

There are 2 best solutions below

1
On BEST ANSWER

In the example, the mean $\overline{X}_n$ is taken of $n$ iid normal observations. Therefore, $\overline{X}_n$ also has a normal distribution with mean $\mu$ and variance $\sigma^2/n$; its probability density function is therefore $f(x)=\frac1{\sqrt{2\pi\sigma^2/n}}e^{\frac{-(x-\mu)^2}{2\sigma^2/n}}$. When we try to compute the mean of $T_n=1/\overline{X}_n$, we find: $$E(|T_n|) = \int_{-\infty}^{\infty} \frac1{\sqrt{2\pi\sigma^2/n}}e^{\frac{-(x-\mu)^2}{2\sigma^2/n}}\frac1{|x|}\ dx=\infty$$ since near $x=0$ the integrand is bounded below by a constant multiple of $1/|x|$, which has infinite integral. Thus each $T_n$ has undefined mean and hence also undefined variance. I'm not sure why Casella says that $\text{Var }T_n=\infty$; I think it would be more correct to simply say that the variance of each $T_n$ is undefined.

0
On

There are many issues with Example 10.1.8 in Casella's book which ultimately lead to a lot of confusion for me when I was working through it a couple years ago. I have seen other posts on MSE that express similar confusions and so this is my attempt to make sense out of the words and equations Casella gives in the example. While I am beginning to see the bigger picture of what's going on here, I have had, and continue to have, trouble explaining it concisely. My hope is someone will read my comments and be able to piece them together into a more concise and clear answer. With that said, let me address each issue in this example separately and hopefully bring at least a little clarity to the situation.

Issue #1 (typo):

First of all, as pointed out in this question there is a typo in the first paragrpah of the example. The correct sentence should read:

...if we take $T_n=\bar X_n$, then $\lim n\mathsf{Var}\bar X_n=\sigma^2$ is the limiting variance of $T_n$.

Issue #2 ($\mathsf{Var}T_n\neq\infty$):

Now in the second paragraph we read:

If we now take $T_n=1/\bar X_n$, we find that the variance is $\mathsf{Var}T_n=\infty$, so the limit of the variances is infinity.

This claim was my first real source of confusion. You see, the variance is not infinite but rather undefined. For an explanation why, see my answer here. In fact, using similar arguments as in the linked answer we can properly conclude that all the even moments of $T_n=1/\bar X_n$ are infinite while all the odd moments are undefined, that is, $\mathsf ET_n^{2m+2}=\infty$ and $\mathsf ET_n^{2m+1}=\text{d.n.e}$ for $m=1,2,\dots$ Since $\mathsf{Var}T_n=\mathsf{E}T_n^2-(\mathsf ET_n)^2$ and $\mathsf ET_n$ is undefined it follows that $\mathsf{Var}T_n$ must also be undefined.

Issue #3 (use of $\approx$):

This issue still has me struggling to provide a clear answer. Let me explain what I have found out so far. The issue of confusion here centers around the statements: $$ \begin{aligned} \mathsf E\bar X_n^{-1} &\approx\mu^{-1}\\ \mathsf{Var}\bar X_n^{-1} &\approx\mu^{-4}\sigma^2/n.\\ \end{aligned} $$ As we have shown in Issue #2 above, the mean and variance of $\bar X_n^{-1}$ are undefined so what exactly are we approximating? Let me first say that the approximation sign does not mean anything as there really isn't a definition of what "$\approx$" means. Here, Casella deserves a little slack because in the sentence preceding these claims he states

...the "approximate" mean and variance of $1/\bar X_n$ are...

So why the quotation marks around approximate? Because he thought the mean and variance were infinite and thus cannot be approximated by anything. We now know that the situation is even stranger than that because the moments do not even exist...let alone be infinite. So what exactly was Casella trying to say by these statements? Let's first take a small detour:

Small (not really) detour (we can reinterpret $\mathsf E\bar X_n^{-1}$ and $\mathsf{Var}\bar X_n^{-1}$ so that they have well-defined values):

This should strike you as a strange claim to make as we just stated that both $\mathsf E\bar X_n^{-1}$ and $\mathsf{Var}\bar X_n^{-1}$ are undefined. However, if we reinterpret the integral representation of $\mathsf E\bar X_n^{-1}$ using the Cauchy principal value we can assign a well-defined value to $\mathsf E\bar X_n^{-1}$ and subsequently $\mathsf{Var}\bar X_n^{-1}$. By the Cauchy principal value interpretation of $\mathsf E\bar X_n^{-1}$ we find $$ \mathsf E\bar X_n^{-1}\overset{\text{p.v.}}{=}\lim_{\epsilon\to 0^+}\left(\int_{-\infty}^{-\epsilon}+\int_\epsilon^\infty\right)\frac{1}{x}f_{\bar X_n}(x)\,\mathrm dx=\frac{\sqrt{2n}}{\sigma}\mathcal D\left(\tfrac{\sqrt n\mu}{\sqrt 2\sigma}\right), $$ where $\mathcal D(z):=e^{-z^2}\int_0^z e^{t^2}\,\mathrm dt$ is the Dawson integral. Note that the "$=$" sign above has been replaced with "$\overset{\text{p.v.}}{=}$" to remind us that this is not a standard equality. Why is this reinterpretation useful? Well, there is a generalized central limit theorem which states that $$ \frac{1}{m}\sum_{k=1}^m\frac{1}{\bar X_{n,k}}\overset{d}{\to}\operatorname{Cauchy}(\mathsf E\bar X_n^{-1},\pi f_{\bar X_n}(0)). $$ So we see that the principal value of $\mathsf E\bar X_n^{-1}$ tells us something about the center of the limiting distribution for suitably large sample means involving $\bar X_n^{-1}$ and thus gives us a useful measure of center or "mean". To see this in action let's set $\mu=\sigma=n=1$ so that $\bar X_n\sim\mathcal N(1,1)$ and define $$ \bar Z_m=\frac{1}{m}\sum_{k=1}^{m}\frac{1}{\bar X_{n,k}} $$ to be the sample average of $m$ i.i.d. observations of $\bar X_n^{-1}$. I generated $10^6$ values of $\bar Z_{10^6}$ in Mathematica and plotted it in a histogram (below). Overlaid on the histrogram is the density function $\operatorname{Cauchy}(\sqrt 2\mathcal D(1/\sqrt 2),\pi f_{\bar X_n}(0))$ which represents the limiting distribution of $\bar Z_m$ as $m\to\infty$:

enter image description here

So again se see that the principal value of $\mathsf E\bar X_n^{-1}$ provides some useful measure of center even though strictly speaking the mean does not exist. Furthermore, by using the Cauchy principal value interpretation of $\mathsf E\bar X_n^{-1}$ we can also define the variance as $$ \mathsf{Var}\bar X_n^{-1}\overset{\text{p.v.}}{=}\mathsf E\bar X_n^{-2}-(\mathsf E\bar X_n^{-1})^2=\infty-\left(\frac{\sqrt{2n}}{\sigma}\mathcal D\left(\tfrac{\sqrt n\mu}{\sqrt 2\sigma}\right)\right)^2=\infty. $$ So even though the result for the variance is not finite we can still claim that it is well-defined and thus the equality holds. Bringing both results together we have through the Cauchy principal value interpretation: $$ \begin{aligned} \mathsf E\bar X_n^{-1} &\overset{\text{p.v.}}{=} \frac{\sqrt{2n}}{\sigma}\mathcal D\left(\tfrac{\sqrt n\mu}{\sqrt 2\sigma}\right)\\ \mathsf{Var}\bar X_n^{-1} &\overset{\text{p.v.}}{=} \infty. \end{aligned} $$

This is starting to look like an awfully long detour but there is a reason. Going back to our expression for $\mathsf E\bar X_n^{-1}$ we can use the asymptotic expansion for the Dawson integral see equation $(9)$ here to deduce as $n\to\infty$: $$ \tag{1} \mathsf E\bar X_n^{-1}\sim\frac{\sqrt{2n}}{\sigma}\left(\frac{\sigma}{\sqrt{2n}\mu}+\frac{\sigma^3}{n\sqrt n\mu^3}+\mathcal O(n^{-5/2})\right)=\frac{1}{\mu}+\frac{\sigma^2}{n\mu^3}+\mathcal O(n^{-2}). $$ Look familiar? If we drop all the higher order terms we find for large $n$: $$ \mathsf E\bar X_n^{-1}\approx\frac{1}{\mu}, $$ which is precisely the "approximation" given by Casella in Example 10.1.8 and is the same result obtained by the (first-order) delta method. This observation is a big clue to what Casella meant by the approximations in the example.

Small detour 2 (use of $\delta$-method for approximating moments of $\bar X_n^{-1}$):

The main idea driving the delta method is that we can approximate the moments of $g(\bar X_n)$ for some function $g$ by considering the moments of the corresponding Taylor polynomial of $g$ (to some order) centered about the mean of $\bar X_n$. If the density of $\bar X_n$ is "concentrated" in the area where $g$ and its Taylor polynomial approximation show good agreement, then we can obtain a useful approximation of the moments for $g(\bar X_n)$. In our case, we consider the function $g(x)=x^{-1}$ and expand $g$ about the mean of $\bar X_n$ (i.e. $\mu\neq 0$) to get $$ g(x)=\frac{1}{\mu}-\frac{x-\mu}{\mu^2}+\frac{(x-\mu)^2}{\mu^3}+\mathcal O((x-\mu )^3). $$ If we define $g_2(x)=\frac{1}{\mu}-\frac{x-\mu}{\mu^2}+\frac{(x-\mu)^2}{\mu^3}$ as the second-order Taylor polynomial approximation to $g(x)$ then we can easily find that $$ \begin{aligned} \mathsf Eg_2(\bar X_n) &=\frac{1}{\mu}+\frac{\sigma^2}{n\mu^3}\\ \mathsf{Var}g_2(\bar X_n) &=\frac{\sigma^2}{\mu^4 n}+2\frac{\sigma^4}{n^2\mu^6}\frac{1}{\mu^3}. \end{aligned} $$ By dropping the higher order terms we obtain the "approximations" given by Casella. Furthermore, the expression for $\mathsf Eg_2(\bar X_n)$ is equivalent to the first two terms of the asymptotic expansion of the principal valued expression for $\mathsf E\bar X_n^{-1}$ given by $(1)$ above. Why? Furthermore, how come the $\delta$-method approach gives finite and well-defined values for moments that do not exist?

Conclusion:

The answer to the questions stated at the end of our second detour has to do with how both approaches remove singularities from the equations representing the moments. In the case of the Cauchy principal-value interpretation of $\mathsf E\bar X_n^{-1}$ we took a limit that was symmetric about the pole in the integrand which effectively canceled out the infinite components that caused the moment to become undefined. Note that this approach is in some sense "exact" in that did not make any restrictive assumptions about the density of $\bar X_n$ residing in some specified interval with high probability (i.e. does not require $n$ to be large). Likewise, the $\delta$-method also removed singularities from the problem by replacing $g(x)=1/x$ with the approximate function $g_2(x)=\frac{1}{\mu}-\frac{x-\mu}{\mu^2}+\frac{(x-\mu)^2}{\mu^3}$ which no longer has any singularities on the support of $\bar X_n$. However, unlike the Cauchy principal-value approach, the use of the Taylor series centered at $\mu$ accompanied with the fact that this Taylor series has a radius of convergence equal to $\mu$ requires that $\mathsf P(\bar X_n\in(\mu-\epsilon,\mu+\epsilon)\approx 1$, which is to say that $\bar X_n$ is observed with high probability near its mean. Since this only happens when $n$ is large we see that the first moment derived from both approaches agree as $n$ becomes large.

So what exactly to the $\approx$ signs in Casella's "approximations" mean? Well, assuming $\mu\neq 0$, as $n$ becomes very large, the density of $\bar X_n$ becomes heavily concentrated about its mean (again, by heavily concentrated we mean $\bar X_n$ is observed to be very close to $\mu$ with high probability). Simultaneously, as we continue to increase $n$ the probability that $\bar X_n$ is in the neighborhood of zero becomes vanishingly small and so the sample mean and sample variance of many $\bar X_n^{-1}$'s begins to become quasi well-behaved and settle down around particular values which correspond to the quantities derived about via the Cauchy principal value and $\delta$ method approximations. At the end of the day these approximations are not really approximating anything but rather are assigning finite values to undefined quantities, which under the right interpretation become useful when $n$ is large.