Variance of Pearson's chi-squared statistic

88 Views Asked by At

Let $\nu=(\nu_1,\ldots,\nu_r)$, $\sum_j \nu_j=n$, be multinomially distributed with parameter vector $p=(p_1,\ldots,p_r)$ and let

$$ \chi^2 = \sum_{j=1}^r X_j^2,\qquad X_j:=\frac{(\nu_j - np_j)^2}{np_j} $$

be Pearson's $\chi^2$-statistic. According to equation (30.1.1) of Cramér, Mathematical Methods of Statistics (1962 edition, p. 417),

$$ \operatorname{Var}\chi^2 = 2(r-1) + \frac1n\left(\sum_{i=1}^r \frac1{p_i} - r^2 - 2r + 2 \right). $$

At the bottom of the subsequent page, Cramér says that this expression by an "easy calculation" from the MacLaurin expansion of the moment generating function

$$ M(t_1,\ldots,t_r) = e^{-\sum_jt_j\sqrt{np_j}} \left( \sum_jp_je^{t_j/\sqrt{np_j}} \right)^n $$

of $X=(X_1,\ldots,X_r)$. (Cramér actually works with the characteristic function $\phi(t)$ rather than the moment generating function $M(t)$, but this shouldn't make a difference for what follows.) I'm trying to do this calculation, but not finding it "easy"!

Since

$$ \operatorname{Var}\chi^2 = E[(\chi^2)^2] - E[\chi^2]^2 $$

and $E[\chi^2]=r-1$ (this part actually is easy), it suffices to compute the second moment

$$ \begin{aligned} E[(\chi^2)^2] &= \sum_j E[X_j^4] + \sum_{j\neq k}E[X_j^2X_k^2]\\ &= \sum_j \left.\frac{\partial^4M}{\partial t_j^4}\right|_{t=0} + \sum_{j\neq k} \left.\frac{\partial^4M}{\partial t_j^2t_k^2}\right|_{t=0}. \end{aligned} $$

The complexity of the expression for $M(t)$ is blocking me from computing the necessary derivatives and simplify the resulting expression. Can anyone suggest a way of approaching/organizing this calculation to get the desired result? Or is there a more conceptual approach that I'm overlooking?

1

There are 1 best solutions below

0
On

The following is a calculation to get the variance of $\chi^2$-statistics, this calculation use the coefficients of the MacLaurin expansion of moment generting function to obtain the required expession.

Let \begin{equation*} M(\mathbf{t})=M(t_1,\cdots,t_r)=\mathsf{E}\Big[\exp\Big(\sum_{i=1}^{r}X_it_i\Big)\Big] \end{equation*} be the moment generating function of $X=(X_1,\cdots,X_r)$ and the MacLaurin expansion of $M$ be \begin{equation*} M(\mathbf{t})=1+U_1(\mathbf{t})+U_2(\mathbf{t})+U_3(\mathbf{t})+U_4(\mathbf{t})+o(t^4), \tag{1} \end{equation*} where $U_k(\mathbf{t})$ is the homogeneous multinomial of $t_i$s order $k$. Then \begin{align*} \mathsf{E}[X_i^2]=(U_2)_{i2}, \quad \mathsf{E}[X_i^4]=(U_4)_{i4},\quad \mathsf{E}[X_i^2X_j^2]=(U_4)_{i2j2}, \end{align*} where $(U_4)_{i2j2}((U_4)_{i4} \text{ resp.})$ denote the coefficient of term $t_i^2t_j^2(t_i^4 \text{ resp.})$ in $U_4(\mathbf{t})$. Next we calculate the concrete form of (1). Since \begin{align*} M(\mathbf{t})&=\mathsf{E}\Big[\exp\Big(\sum_{j=1}^{r}X_jt_j\Big)\Big] =\mathsf{E}\Big[\exp\Big(\sum_{j=1}^{r}\frac{\nu_j-np_j}{\sqrt{np_j}}t_j\Big)\Big]\\ & = \Big(\sum_{j=1}^{r}p_je^{t_j/\sqrt{np_j}}\Big)^n\exp\Big( -\sum_{j=1}^r\sqrt{np_j}t_j\Big), \\ \log M(\mathbf{t})& =n\log\Big[1+ \sum_{j=1}^{r} p_j(e^{t_j/\sqrt{np_j}}-1)\Big] -\sum_{j=1}^{r}\sqrt{np_j}t_j\\ & = n\log\Big[1+ \sum_{j=1}^{r}p_j \sum_{k=1}^{\infty} \frac{1}{k!}\Big(\frac{t_j}{\sqrt{np_j}}\Big)^k \Big] -\sum_{j=1}^{r}\sqrt{np_j}t_j \\ & = \sum_{l=1}^{\infty}\frac{(-1)^{l-1}n}{l}\Big[\sum_{k=1}^{\infty} \sum_{j=1}^{r}\frac{p_j}{k!} \Big(\frac{t_j}{\sqrt{np_j}}\Big)^k \Big]^l -\sum_{j=1}^{r}\sqrt{np_j}t_j\\ & = \sum_{l=1}^{\infty}\frac{(-1)^{l-1}n}{l}\Big[\sum_{k=1}^{\infty} S_k\Big]^l -\sum_{j=1}^{r}\sqrt{np_j}t_j, \tag{2} \end{align*} where \begin{equation*} S_k=\frac1{n^{k/2}k!}\sum_{j=1}^{r}p_j^{1-k/2}t_j^k. \tag{3} \end{equation*} In particular, \begin{gather*} nS_1=\sum_{j=1}^{r}\sqrt{np_j}t_j, \qquad S_2=\frac1{2n}\sum_{j=1}^{r}t_j^2, \tag{4} \\ S_3=\frac1{6n^{3/2}}\sum_{j=1}^{r}\frac{t_j^3}{\sqrt{p_j}},\qquad S_4=\frac1{24n^{2}}\sum_{j=1}^{r}\frac{t_j^4}{p_j}. \tag{5} \end{gather*} $S_k$ is an homogeneous multinomial of $t_j$s order $k$. Since we need 4-th order moments of $X_j$, hence we remains lower order terms in the following expension of $\log M(\mathbf{t}) $, \begin{align*} \log M(\mathbf{t})& = n\sum_{k=2}^{\infty}S_k + \sum_{l=2}^{\infty}\frac{(-1)^{l-1}n}{l}\Big[\sum_{k=1}^{\infty} S_k\Big]^l \\ & = (nS_2+ nS_3+ nS_4) +\Big(-\frac{n}{2} S_1^2 - \frac n2 S_2^2 -nS_1S_2 - nS_1S_3 \Big) \\ & \qquad +\Big(\frac n3 S_1^3 + nS_1^2S_2\Big) - \frac{n}{4}S_1^4 + o(\mathbf{t}^4) \\ & = T_2 + T_3 + T_4 + o(\mathbf{t}^4), \end{align*} where $T_k$ is the homogeneous multinomial of $t_j$s order $k$ and \begin{align*} T_2&=nS_2- \frac{n}{2} S_1^2,\\ T_4&=nS_4-\frac{n}{2}S_2^2+nS_1^2S_2-nS_1S_3-\frac{n}{4}S_1^4. \end{align*} Furthermore, \begin{align*} M(\mathbf{t})&=1+(T_2+ T_3 + T_4)+\frac{1}{2}(T_2+ T_3 + T_4)^2+ o(\mathbf{t}^4)\\ &=1+T_2+R_3+U_4+o(\mathbf{t}^4), \tag{6} \end{align*} where $R_3$ is a homogeneous multinomial of $t_j$s order $3$ and \begin{align*} U_4&=T_4+\frac{1}{2}T_2^2\\ &=nS_4+\frac{n(n-1)}{2}S_2^2 -nS_1S_3 -\frac{n(n-2)}{2}S_1^2S_2 +\frac{n(n-2)}{8}S_1^4. \tag{7} \end{align*} Now from (4)(5) easy to get the coefficients of $(S_k)_{j4}$. Furthermore, the $(U_4)_{j4}$ and the $\mathsf{E}X_j^4 $ could be caculated as following: \begin{align*} \sum_{j=1}^{r}&\mathsf{E}X_j^4=4!\sum_{j=1}^{r}(U_4)_{j4}\\ &=24\sum_{j=1}^r\Big[(nS_4)_{j4}+\frac{n(n-1)}{2}(S_2^2)_{j4} - n(S_1S_3)_{j4} \\ &\qquad +\Big(n-\frac{n^2}{2}\Big)(S_1^2S_2)_{j4} +\Big(\frac{n^2}{8}-\frac{n}{4}\Big)(S_1^4)_{j4}\Big] \\ &=24\sum_{j=1}^{r}\Big[\frac1{24n}\frac{1}{p_j}+\Big(\frac{1}{8}-\frac{1}{8n}\Big) -\frac{1}{6n}\\ &\qquad + \Big(-\frac{1}{4}+\frac{1}{2n}\Big)p_j +\Big(\frac{1}{8}-\frac{1}{4n}\Big)p_j^2\Big]\\ &=\frac{1}{n}\sum_{j=1}^{r}\frac{1}{p_j}+\Big(3r-\frac{7}{n}r\Big) +\Big(-6+\frac{12}{n}\Big)+\Big(3-\frac{6}{n}\Big)\sum_{j=1}^{r}p_j^2, \end{align*} \begin{align*} \sum_{1\le i\ne j\le r}&\mathsf{E}[X_i^2X_j^2]=2!2!\sum_{1\le i\ne j\le r}(U_4)_{i2j2}\\ &=4\sum_{1\le i\ne j\le r}\Big[(nS_4)_{i2j2}+\frac{n(n-1)}{2}(S_2^2)_{i2j2} - n(S_1S_3)_{i2j2} \\ &\qquad +\Big(n-\frac{n^2}{2}\Big)(S_1^2S_2)_{i2j2} +\Big(\frac{n^2}{8}-\frac{n}{4}\Big)(S_1^4)_{i2j2}\Big]\\ &= \sum_{1\le i\ne j\le r} \Big[0+\Big(1-\frac{1}{n}\Big)+0 \\ &\qquad +\Big(-1+\frac{2}{n}\Big)(p_i+p_j) + \Big(3-\frac{6}{n}\Big)(p_i+p_j) \Big]\\ &=(r^2-3r+2)-\frac{r^2-5r+4}{n}+\Big(3-\frac{6}{n}\Big)\Big(1-\sum_{j=1}^{r}p_j^2\Big). \end{align*} At last, \begin{align*} \mathsf{E}[(\chi^2)^2]&=\sum_{i=1}^r\mathsf{E}[X_i^4]+\sum_{1\le i\ne j\le r} \mathsf{E}[X_i^2X_j^2]\\ &=(r^2-1)+\frac{1}{n}\Big(\sum_{j=1}^r\frac{1}{p_j}-r^2-2r+2\Big),\\ \mathsf{D}[\chi^2] &=\mathsf{E}[(\chi^2)^2]-(\mathsf{E}[\chi^2])^2\\ &=2(r-1)+\frac{1}{n}\Big(\sum_{j=1}^r\frac{1}{p_j}-r^2-2r+2\Big). \end{align*}