Variance of sample fourth moment

117 Views Asked by At

Let $X_1,X_2,\dots$ be i.i.d. standard normal variables. We assume we know that they are standardized and i.i.d., but we don't know the distribution. We want to estimate $M_4:=E[X^4_i]=3$. One possibility is to compute the sample fourth moment:

$$\hat{M}_4:=\frac{1}{n} \sum_{i=1}^n X_i^4$$

By the law of large numbers $\hat{M}_4$ will converge in probability to $M_4$. Another possible estimator for $M_4$ is

$$\hat{M}_{4}^{'}:=\frac{\hat{M}_4}{(\frac{1}{n}\sum_{i=1}^n X_i^2)^2} $$

Simulations seem to indicate that $\hat{M}_{4}^{'}$ is more precise than $\hat{M}_4$, with a mean squared error smaller by a around $1/2$.

Are there theoretical reasons for this? Am looking for a formal calculation.

Thanks a lot for your help.

2

There are 2 best solutions below

1
On BEST ANSWER

I have the feeling that the answer given a few days ago was too short: here is a detailed proof...

Let $X_1,\ldots, X_n,\ldots$iid with law $N(0,1)$ and $$M_2(n)=\frac{1}{n}(X_1^2+\cdots+X_n^2),\ M_4(n)=\frac{1}{n}(X_1^4+\cdots+X_n^4).$$ We compute the variance of $M'_4(n)=M_4(n)/(M_2(n))^2.$ To this end recall the definition of a Dirichlet distribution $\mathcal{D}(a_0,a_1,\ldots,a_N)$ where $a_0,\ldots,a_N>0.$ We say that $(U_0,U_1,\cdots,U_N)\sim \mathcal{D}(a_0,a_1,\ldots,a_N)$ if $U_0,\ldots,U_N>0$ and $U_0+\cdots+U_N=1$ and if the density $(U_1,\ldots,U_N)$ is $$C(1-u_1-\ldots-u_N)^{a_0-1}u_1^{a_1-1}\ldots u_N^{a_N-1}$$ with $$\frac{1}{C}=B(a_0,\ldots,a_N)=\frac{\Gamma(a_0)\ldots \Gamma(a_N)}{\Gamma(a_0+\cdots+a_N)}.$$ Remember that if $Y_0,\ldots,Y_N$ are independent with respective densities on $(0,\infty)$ equal to $b(by_i)^{a_i-1}e^{-by_i}/\Gamma(a_i)$ then $$ (U_0,\ldots,U_N)=\frac{(Y_0,\ldots,Y_N)}{Y_0+\cdots+Y_N}\sim \mathcal{D}(a_0,a_1,\ldots,a_N).\ \ (*)$$ This is a classical fact and we skip its proof. From (*) we obtain that $$(U_0+\cdots+U_k,U_{k+1},\ldots,U_N)\sim \mathcal{D}(a_0+a_1,+\cdots+a_k,a_{k+1},\ldots, a_N)\ (**).$$ We apply this $Y_i=X_i^2$ which has a chisquare distribution, namely a gamma law with $b=2$ et $a_i=1/2.$ Therefore if $U_i=X_i^2/nM_2(n)$ alors $$M'_4(n)=n(U_1^2+\cdots+U_n^2)$$ since $(U_1,\ldots,U_n)\sim \mathcal{D}(1/2,\ldots,1/2).$ From (**) we have $$(U_1,U_2+\cdots+U_n)\sim \mathcal{D}(1/2,(n-1)/2),\ (U_1,U_2, U_3+\cdots+U_n)\sim \mathcal{D}(1/2,1/2,(n-2)/2.\ (***).$$

Now we compute $$\mathbb{E}(M'_4(n))=n^2\mathbb{E}(U_1^2)=n^2\frac{B(2+\frac{1}{2}, \frac{n-1}{2})}{B(\frac{1}{2},\frac{n-1}{2})}=\frac{3n}{n+2}$$ The evaluation of $\mathbb{E}((M'_4(n))^2)$ is more involved and needs the values of $\mathbb{E}(U_1^4)$ and of $\mathbb{E}(U_1^2U_2^2)$, still using(***).

$$\mathbb{E}(U_1^4)=\frac{B(4+\frac{1}{2}, \frac{n-1}{2})}{B(\frac{1}{2},\frac{n-1}{2})}=\frac{\Gamma(4+\frac{1}{2})}{\Gamma(\frac{1}{2})}\times \frac{\Gamma(4+\frac{n-1}{2})}{\Gamma(\frac{n-1}{2})}=\frac{7\times 5\times 3}{(n+6)(n+4)(n+2)n}$$

$$\mathbb{E}(U_1^2U_2^2)=\frac{B(2+\frac{1}{2}, 2+\frac{1}{2}, \frac{n-2}{2})}{B(\frac{1}{2},\frac{1}{2},\frac{n-2}{2})}=\frac{\Gamma^2(\frac{5}{2})}{\Gamma^2(\frac{1}{2})}\times \frac{\Gamma(\frac{n}{2})}{\Gamma(\frac{n}{2}+4)}=\frac{9}{(n+6)(n+4)(n+2)n}$$ As a consequence

$$\mathbb{E}((M'_4(n))^2)=n^2[n\mathbb{E}(U_1^4)+n(n-1)\mathbb{E}(U_1^2U_2^2)]=\frac{3n^2(3n+32)}{(n+6)(n+4)(n+2)}$$

$$\sigma^2(M'_4(n))=\mathbb{E}((M'_4(n))^2)-(\mathbb{E}(M'_4(n)))^2=\frac{24n^2(n-1)}{(n+6)(n+4)(n+2)^2}\sim _{n\to \infty}\frac{24}{n}.$$

2
On

So your sample $X_1,\ldots,X_n$ has an unknown distribution but you know that the mean and the variance of this distribution are $0$ and $1$. You would like to decide whether or not $E(X_i^4)=3$ (if not, you reject Gaussianity). I understand that you want to compute the variances of $M_4$ and $M'_4$. This is clearly impossible if you do not know the distribution. But if you assume that $X_i\sim N(0,1)$ the computation of these two variances becomes a reasonable problem. The trick for performing the calculations is to consider $Y_i=X_i^2$ and $U_i=Y_i/(Y_1+\cdots+Y_n).$ With this notation $(U_1,\ldots,U_n)$ has a Dirichlet distribution with parameters $(1/2,\ldots,1/2).$ For instance $$E(M'_4)=nE(U_1^2+\cdots+U_n^2)=n^2E(U_1^2)=\frac{3n}{n+2}$$ and $E((M'_4)^2)=n^2E[(U_1^2+\cdots+U_n^2)^2].$

Since the distribution of $U_1^2$ is Beta$(1/2,(n-1)/2)$ and since the joint density of $(U_1,U_2)$ is $u_1^{-1/2}u_2^{-1/2}(1-u_1-u_2)^{(n-4)/2}$ the calculation of $E((M'_4)^2)$ is possible: the price is a simple double integral. I have not completed the calculations, but you have raised an interesting problem.