Expected proportion of successes coming from a distinguished group

41 Views Asked by At

Let $X$ be a binomial random variable with parameters $N_1, p_1$ and let $Y$ be a binomial random variable with parameters $N_2, p_2$. Assume $X$ and $Y$ are independent. Is there a nice way to calculate $\mathbb{E} \left[ \frac{X}{X+Y} \right]$? (In order for this to exist let's just say that $0/0 = 1$). Alternatively, if we assume $N_1$ and $N_2$ are large, is there a nice way to estimate it by approximating $X$ and $Y$ by corresponding normal distributions?

The interpretation/motivation for this question: suppose you have a pool of $N_1 + N_2$ people. Say $N_1$ of them are "special" and the other $N_2$ of them are "normal". Each special person has a probability $p_1$ of completing a task, and each normal person has probability $p_2$ of completing the task. Assume all the people are independent of each other. Then if $X$ is the number of special people who complete the task and $Y$ is the number of normal people who complete the task, I am looking for the expected proportion of successes that came from special people.

EDIT: It's natural to expect that this quantity should be pretty close to $\frac{\mathbb{E}[X]}{\mathbb{E}[X+Y]}$. It would be super cool to see an estimate of the difference between the two as $N_1, N_2 \to \infty$.

1

There are 1 best solutions below

4
On

I don't know if you consider simulation to be 'nice', but it can easily give an approximate solution for a particular practical application where all four parameters are known.

However, a rigorous analytic discussion has to address that the expectation does not exist because of the tiny probability (even with large $n_1, n_2)$ that the denominator can be $0.$ Perhaps the comment of @angryavian is a more realistic approach.

Let $X \sim \mathsf{Binom}(16, .4)$ and $Y \sim \mathsf{Binom}(25, .3).$ Then $E\left(\frac{X}{X+Y}\right) = 0.4622\pm 0.0002.$

set.seed(2020)
x = rbinom(10^6, 16, .4)
y = rbinom(10^6, 25, .3)
mean(x/(x+y))  
[1] 0.4622126
2*sd(x/(x+y))/sqrt(10^6)
[1] 0.0002267313  # 95% margin of simulation error