Suppose we have two discrete variable $(X,Y)$, where $X$ takes values $x_i, i=1,2,...,k$ and $Y$ takes values $y_j,j=1,2,...,l$.
Suppose we have total $n$ samples of $(X,Y)$, let random variable $Z_{ij}$ the number of samples with $X=x_i,Y=y_j$.
My question is how to compute $\mathrm{Var}(Z_{ij})$ and $\mathrm{Cov}(Z_{i_1,j},Z_{i_2,j})$.
I think of three solutions, however, they give contradictive answers, which is my major confusion.
Solution-1
The random vector $\mathbf{Z}=[Z_{ij}]_{i=1,2,...,k; j=1,2,...,l}$ follows a multinomial distribution with parameters $(n,\{p(x_i,y_j)\}_{i=1,2,...,k; j=1,2,...,l})$.
So, we have $\mathrm{E}(Z_{ij})=np(x_i,x_j)$ and $\mathrm{Var}(Z_{ij})=np(x_i,y_j)(1-p(x_i,y_j))$.
Solution-2
The random vector $\mathbf{Z}=[Z_{ij}]_{\text{fixed} \, i; j=1,2,...,l}$ follows a multinomial distribution with parameters $(n_{i:},\{p(y_j|x_i)\}_{\text{fixed} \, i; j=1,2,...,l})$, where $n_{i:}$ is the number of samples with $X=x_i$.
So, we have $\mathrm{E}(Z_{ij})=n_{i:}p(y_j|x_i)$ and $\mathrm{Var}(Z_{ij})=n_{i:}p(y_j|x_i)(1-p(y_j|x_i))$.
Solution-3
The random vector $\mathbf{Z}=[Z_{ij}]_{i=1,2,...,k; \text{fixed} \, j}$ follows a multinomial distribution with parameters $(n_{:j},\{p(x_i|y_j)\}_{i=1,2,...,k; \text{fixed} \, j})$, where $n_{:j}$ is the number of samples with $Y=y_j$.
So, we have $\mathrm{E}(Z_{ij})=n_{:j}p(x_i|y_j)$ and $\mathrm{Var}(Z_{ij})=n_{:j}p(x_i|y_j)(1-p(x_i|y_j))$.
If Solution-1,2,3 are correct, then we have $p(x_i,y_j)=p(x_i|y_j)=p(y_j|x_i)$, which is apparently wrong.
Can anyone points out which part of my solutions is wrong? What's the correct answer?
P.S. My objective is to get the MLE estimator and its co-variance matrix of the parameter $[p(y_j|x_1),...,p(y_j|x_k)]$ for some fixed $j$.