I'm trying to compute the expected value and variance of a mixed distribution and comparing it to a simulated average and sample variance. The problem is that the calculation and simulation doesn't match each other. I'm wondering if there is something wrong with my calculation or if the simulation is way off.
The mixed distribution is as follows
$X\mid Y=y\sim \text{Binomial}(n, y),\qquad\qquad$
$Y\sim \text{Beta}(\alpha, \beta)$
and I want to use the following formulas
$E(X)=E_Y(E_X(X\mid Y)),\qquad\qquad \text{Var}(X)=E_Y(\text{Var}_X(X\mid Y))+\text{Var}_Y(E_X(X\mid Y))$.
This is how I have calculated the expected value and the variance:
$E(X)=E_Y(E_X(X\mid Y))=E_Y(nY)=nE_Y(Y)=\underline{\underline{n\cdot\displaystyle\frac{\alpha}{\alpha+\beta}}}$
$\text{Var}(X)=E_Y(\text{Var}_X(X\mid Y))+\text{Var}_Y(E_X(X\mid Y))$
$=E_Y(nY(1-Y))+\text{Var}_Y(nY)$
$=n\left(E_Y(Y(1-Y))+\text{Var}_Y(Y)\right)$
$=n\left(\displaystyle\frac{\alpha\beta}{(\alpha+\beta)(\alpha+\beta+1)}+\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}\right)=\underline{\underline{n\cdot\displaystyle\frac{\alpha\beta}{(\alpha+\beta)^2}}}$
For the simulation I used the following parameters
$\qquad m=\text{# observations}=10\,000$
$\qquad n=\text{# trails/observation} = 200$
$\qquad\alpha=\beta=2$
which gave me $100.31$ as the average # of success, and $2050.72$ as the sample variance.
Using the same parameters, $E(X)=100$ and $\text{Var}(X)=50$.
The expected number seem to be accurate, but which variance is correct?
Your simulation looks reasonable up to the usual noise associated with simulations. My version in R would be
The error in your calculations is saying $\text{Var}_Y(nY) = n\text{Var}_Y(Y)$ when you should have $\text{Var}_Y(nY) = n^2\text{Var}_Y(Y)$. That would make your result be $$\text{Var}_X(X)=\frac{n\alpha\beta(\alpha+\beta+n)}{(\alpha+\beta)^2(\alpha+\beta+1)}$$ which is $2040$ here, much closer to your and my simulations and (as dmh says) the variance of the beta-binomial distribution