Discrete random variable conditioned on continuous random variable

40 Views Asked by At

I'm trying to compute the expected value and variance of a mixed distribution and comparing it to a simulated average and sample variance. The problem is that the calculation and simulation doesn't match each other. I'm wondering if there is something wrong with my calculation or if the simulation is way off.

The mixed distribution is as follows
$X\mid Y=y\sim \text{Binomial}(n, y),\qquad\qquad$ $Y\sim \text{Beta}(\alpha, \beta)$
and I want to use the following formulas
$E(X)=E_Y(E_X(X\mid Y)),\qquad\qquad \text{Var}(X)=E_Y(\text{Var}_X(X\mid Y))+\text{Var}_Y(E_X(X\mid Y))$.

This is how I have calculated the expected value and the variance:
$E(X)=E_Y(E_X(X\mid Y))=E_Y(nY)=nE_Y(Y)=\underline{\underline{n\cdot\displaystyle\frac{\alpha}{\alpha+\beta}}}$
$\text{Var}(X)=E_Y(\text{Var}_X(X\mid Y))+\text{Var}_Y(E_X(X\mid Y))$
$=E_Y(nY(1-Y))+\text{Var}_Y(nY)$
$=n\left(E_Y(Y(1-Y))+\text{Var}_Y(Y)\right)$
$=n\left(\displaystyle\frac{\alpha\beta}{(\alpha+\beta)(\alpha+\beta+1)}+\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}\right)=\underline{\underline{n\cdot\displaystyle\frac{\alpha\beta}{(\alpha+\beta)^2}}}$

For the simulation I used the following parameters
$\qquad m=\text{# observations}=10\,000$
$\qquad n=\text{# trails/observation} = 200$
$\qquad\alpha=\beta=2$
which gave me $100.31$ as the average # of success, and $2050.72$ as the sample variance.

Using the same parameters, $E(X)=100$ and $\text{Var}(X)=50$.
The expected number seem to be accurate, but which variance is correct?

2

There are 2 best solutions below

1
On BEST ANSWER

Your simulation looks reasonable up to the usual noise associated with simulations. My version in R would be

set.seed(2022)
alpha <- 2
beta <- 2
n <- 200
cases <- 10000
Y <- rbeta(cases, alpha, beta)
X <- rbinom(cases, n, Y)
mean(X)
# 99.9413
var(X)  
# 2021.529

The error in your calculations is saying $\text{Var}_Y(nY) = n\text{Var}_Y(Y)$ when you should have $\text{Var}_Y(nY) = n^2\text{Var}_Y(Y)$. That would make your result be $$\text{Var}_X(X)=\frac{n\alpha\beta(\alpha+\beta+n)}{(\alpha+\beta)^2(\alpha+\beta+1)}$$ which is $2040$ here, much closer to your and my simulations and (as dmh says) the variance of the beta-binomial distribution

0
On

You are missing a factor of $\frac{\alpha + \beta + n}{\alpha + \beta + 1}$ in your expression for the variance. See https://en.wikipedia.org/wiki/Beta-binomial_distribution

you might also want to increase the size of $m$.