Calculation of covariances $cov(x_i^{2},x_j)$ and $cov(x_i^{2},x_j^{2})$ for multinomial distribution

129 Views Asked by At

We know that,$ \ \ \ \ cov(x_i,x_j)=-n \ x_i \ x_j$. It can be proven in this manner:

We know, $Var(x_i+x_j)=cov((x_i+x_j),(x_i+x_j))$

Now, $cov((x_i+x_j),(x_i+x_j))=cov(x_i,x_i)+2 \ cov(x_i,x_j) + cov(x_j,x_j)=Var(x_i)+Var(x_j)+2 \ cov(x_i,x_j)$

Since, $Var(x_i+x_j)=n(p_i+p_j)(1-p_i-p_j)$ and $Var(x_i)=np_i$ and $Var(x_j)=np_j$

Hence, $cov(x_i,x_j)=(\frac{1}{2})[Var(x_i+x_j)-Var(x_i)-Var(x_j)]=(\frac{1}{2})[n(p_i+p_j)(1-p_i-p_j)-np_i-np_j]=(\frac{1}{2})[-2 \ n \ p_i p_j]=-n p_i p_j$

Hence, $cov(x_i,x_j)=-n \ p_i \ p_j$

Now I am interested in calculating $cov(x_i^{2},x_j)$ and $cov(x_i^{2},x_j^{2})$. But here I can't use the same method because $Var(x_i^{2}+x_j)$ and $Var(x_i^{2}+x_j^{2})$ will not be binomially distributed.

Reply @kimchi_lover::

Moment generating function for general case $Multinomial(n,k,(p_1,p_2,...,p_k))$ is,

$M_X(t_1,t_2,...,t_k)=E(e^{t_1x_1+t_2x_2+...t_kx_k})=\sum_{x\in s}^{.} \binom{n}{x_1 \ x_2 \ .... \ x_k} [p_1^{x_1}p_2^{x_2}....p_k^{x_k}]e^{[t_1x_1+t_2x_2+....+t_kx_k]}=\sum_{x\in s}^{.} \binom{n}{x_1 \ x_2 \ .... \ x_k}\prod_{i=1}^{k} (p_ie^{t_i})^{x_i}=(\sum_{i=1}^{k} \ p_ie^{t_i})^{n}$

Now in trinomial case (categorization:i,j and rest of possible outcome),

$M_X(t_i,t_j,t_{rest})=(p_ie^{t_i}+p_je^{t_j}+(1-p_i-p_j)e^{t_{rest}})^n$

Now I think formulas for $E(x_i^2x_j)$ and $E(x_i^2x_j^2)$ are,

$E(x_i^{2}x_j)=\left [ \frac{\partial^3 M_X(t_i,t_j,t_{rest})}{\partial t_i^{2}\partial t_j} \right ]_{t_i=0,t_j=0,t_{rest}=0}$ and $E(x_i^{2}x_j^{2})=\left [ \frac{\partial^4 M_X(t_i,t_j,t_{rest})}{\partial t_i^{2}\partial t_j^{2}} \right ]_{t_i=0,t_j=0,t_{rest}=0}$

Then, I get,

$E(x_i^{2}x_j)=n(n-1)(n-2)p_i^{2}p_j+n(n-1)p_ip_j$

$E(x_i^{2}x_j^{2})=n(n-1)(n-2)(n-3)p_i^{2}p_j^{2}+n(n-1)(n-2)p_i^{2}p_j+n(n-1)(n-2)p_ip_j^{2}+n(n-1)p_ip_j$

Now for verification of the above process I also calculated, $E(x_ix_j)=n(n-1)p_ip_j$. Since $cov(x_i,x_j)=E(x_ix_j)-E(x_i)E(x_j)$ and $cov(x_i,x_j)=-np_ip_j$ is well known result, by putting $E(x_i)=np_i$ and $E(x_j)=np_j$, I calculated $E(x_ix_j)$ which gives same expression as above.

Finally I used my simulation data to compare with theoretical results.

Theoretical results,

$E(x_ix_j)=16.8091$, $E(x_i^{2}x_j)=96.6947$ and $E(x_i^{2}x_j^{2})=314.7745$

Experimental results,

$E(x_ix_j)=16.8116$, $E(x_i^{2}x_j)=96.6947$ and $E(x_i^{2}x_j^{2})=314.8296$

Here, $n=10$, $p_i=0.593994150290162$, $p_j=0.314427655266167$. Small mismatch between theoretical result and experimental result can be due to rounding error.

I want know your comment @kimchi lover.