Two binomial distributions with different number of trials

1k Views Asked by At

I have two binomial distributions, each with a different number of trials. How can I model the distribution of the variable $X$ in $$X=Y+Z$$ where $Y$ is Binomial($n$, $P_Y$) and $Z$ is Binomial($m$, $P_Z$)?


I've read threads on binomials and Poisson binomials, such as

None of them address what happens when the number of trials differs ($n$ and $m$).

I'd like to get a percentile rank or such for the resulting distribution. Calculating the variance is simple: $$Var(X) = Var(Y) + Var(Z) = n*P_Y*(1-P_Y) + m*P_Z*(1-P_Z)$$

Likewise the standard deviation is just the square root of that. But I'm stuck on finding any sort of percentile rank for a given result, or finding where the $N$th percentile of the result is. The Poisson binomial formulas don't work because they require the same $n$ for each binomial.

$P_Y$ and $P_Z$ differ by about an order of magnitude, and $n$ and $m$ differ by a couple orders of magnitude. So it's hard to just weight each piece and hope it comes out in the wash. I tried approximating with a straight Poisson distribution too since the mean of $X$ is simple to find, but that doesn't seem to do any better. I think the wide variation between $P_Y$ and $P_Z$ may be the culprit.

I can't separate out the distribution for $n$ and $m$ because all I see is the combined result $X$. Each event involves one trial of $Y$ and multiple trials of $Z$. In concrete terms, I have separate events with values like $n = 1, P_Y = .25$ and $m = 40, P_Z = .02$. That gives one result, a combined number of successes. Then that's repeated as another event, yielding another combined result.

Taking a number of these events together, I'll end with something like $n = 10$ and $m = 400$ yielding 15 total successes where the mean is 10.5. Trying to figure out how that total ranks within the distribution of expected values. Any ideas appreciated.

1

There are 1 best solutions below

5
On

By using convolution,

\begin{align} P(X=x) &= P(Y+Z = x) \\ &= \sum_{z=\max(0,x-n)}^{\min(m,x)} P(Y+Z=x|Z=z)P(Z=z) \\ &= \sum_{z=\max(0,x-n)}^{\min(m,x)} P(Y=x-z)P(Z=z) \\&= \sum_{z=\max(0,x-n)}^{\min(m,x)} \binom{n}{x-z}P_Y^{x-z}(1-P_Y)^{n-x-z}\binom{m}{z}P_Z^{z}(1-P_Z)^{m-z} \end{align}

Assuming independent, the characteristic function is

$$ \left( P_Y {{\rm e}^{it}}+1-P_Y \right) ^{n}\left( P_Z {{\rm e}^{it}}+1-P_Z \right) ^{m}$$