What is the distribution of 2 consecutive Binomial trials?

822 Views Asked by At

I encountered the following situation.

We have $n$ objects. We mark each object independently with probability $p_1$ and form a new subset of marked objects - $A_1$. From this set, again we mark each object independently with probability $p_2$ to form a new subset $A_2\subseteq A_1$. What is the distribution of the size of $A_2$?

It seems to me that it is supposed to be $Bin(n, p_1p_2)$, since it seems the same as assigning each object a random vector in $\left\{ 0,1\right\} ^{2}$ where the first element is 1 with probability $p_1$ and the second is 1 with probability $p_2$ and then choosing those objects assigned with $(1,1)$. However, trying to prove it I get the following equation which I can't prove: $$\left(\begin{array}{c} n\\ a \end{array}\right)\left(p_{1}p_{2}\right)^{a}\left(1-p_{1}p_{2}\right)^{n-a}=\sum\limits _{i=a}^{n}\left(\begin{array}{c} n\\ i \end{array}\right)p_{1}^{i}\left(1-p_{1}\right)^{n-i}\left(\begin{array}{c} i\\ a \end{array}\right)p_{2}^{a}\left(1-p_{2}\right)^{i-a}$$

I got both sides of the equation as the probability of getting $|A_2|=a$.

1

There are 1 best solutions below

1
On BEST ANSWER

You have a hierarchical model: $$X_1 \sim \operatorname{Binomial}(n,p_1)$$ counts the number of successes in the first pass, and $$X_2 \mid X_1 \sim \operatorname{Binomial}(X_1, p_2)$$ counts the number of successes on the second pass, conditioned on $X_1$. Thus the unconditional distribution of $X_2$ is given by the sum $$\Pr[X_2 = x] = \sum_{k=0}^n \Pr[X_2 = x \mid X_1 = k]\Pr[X_1 = k] = \sum_{k=0}^n \binom{k}{x} p_2^x (1-p_2)^{k-x} \binom{n}{k} p_1^k (1-p_1)^{n-k}.$$ Of course, the lower index of this sum actually begins at $k = x$, since the second pass cannot result in $x$ successes if fewer than $x$ successes were obtained in the first pass. Thus we are motivated to shift the index of summation accordingly; i.e., let $m = k-x$, hence $$\begin{align*}\Pr[X_2 = x] &= \sum_{m=0}^{n-x} \binom{m+x}{x} \binom{n}{m+x} p_2^x (1-p_2)^m p_1^{m+x} (1-p_1)^{n-m-x} \\ &= (p_1 p_2)^x \sum_{m=0}^{n-x} \frac{n!}{x! m!(n-x-m)!} (p_1 (1-p_2))^m (1-p_1)^{n-x-m} \\ &= (p_1 p_2)^x \binom{n}{x} \sum_{m=0}^{n-x} \binom{n-x}{m} (p_1(1-p_2))^m (1-p_1)^{n-x-m} \\ &= \binom{n}{x} (p_1 p_2)^x (p_1(1-p_2) + (1-p_1))^{n-x} \\ &= \binom{n}{x} (p_1 p_2)^x (1 - p_1 p_2)^{n-x}, \end{align*}$$ which is the PMF of a $\operatorname{Binomial}(n, p_1 p_2)$ random variable. Note the use of the binomial theorem in the penultimate step: $$(a+b)^n = \sum_{k=0}^n \binom{n}{k} a^k b^{n-k}.$$ Note also the rather interesting but simple identity $$\binom{m+x}{x} \binom{n}{m+x} = \binom{n}{x} \binom{n-x}{m}.$$