Mixing binomial distributions

2.2k Views Asked by At

This is a mixture binomial distribution question. I know how to get the $\mu$ and $\sigma^2$ of the mixture, but I am not sure how to use it to get the probably of specific number.

Question:
States that $X_1=B(2,0.52), X_2=B(3,0.41), X_3=B(4,0.38)$ are binomial distributions with 43%, 36%, 21% users respectively. Find the probability that occurrence is more than 2. The question don't state whether the variables are independence of each other. I will assume this is a mixture distribution question.

Binomial distribution: https://en.wikipedia.org/wiki/Binomial_distribution

Reference for mixture distribution: https://en.wikipedia.org/wiki/Mixture_distribution

My solution:
$X_1 = B(2, 0.52), E(X_1)=1.04, E(X_1^2)=1.5808, Var(X_1)=0.4992$

$X_2 = B(3, 0.41), E(X_2)=1.23, E(X_2^2)=2.2386, Var(X_2)=0.7257$

$X_3 = B(4, 0.38), E(X_3)=1.52, E(X_3^2)=3.2528, Var(X_3)=0.9424$

I see S as a mixture distribution of $X_1, X_2, X_3$

$P(S=0) = 0.43P(X_1=0) + 0.36P(X_2=0) + 0.21P(X_3=0)$

$E(S) = 0.43E(X_1) + 0.36E(X_2) + 0.21E(X_3)=1.2092$

$E(S^2)=0.43E(X_1^2) + 0.36E(X_2^2) + 0.21E(X_3^2)=2.168728$

$Var(S)=E(S^2)-(E(S))^2=0.706563$

Stuck:
How do I get $P(S<=2)$? Do I do $P(\frac{S-1.2092}{\sqrt{0.706563}}<=\frac{2-1.2092}{\sqrt{0.706563}})$? Is normal distribution method of getting the probability correct?

Or should I do $P(S<=2) = 1-P(S>2)$ where $P(S>2)=0.43P(X_1>2)+0.36P(X_2>2)+0.21P(X_3>2)$

2

There are 2 best solutions below

6
On BEST ANSWER

You are correct that distributions of $X_1, X_2, X_3$ can be mixed without regard to independence. Let $Y$ be the mixture of the of the three $X_i$ with proportions .43, .36, and .21. The mixture random variable can take only values $0, 1, 2, 3, 4.$ In particular, $P(Y = 0) = .43P(X_1 = 0) + .36P(X_2 = 0) + .21P(X_3 = 0) \approx 0.21,$ where the values $P(X_i = 0)$ are determined by the respective binomial distributions.

By contrast, your random variable $S = .43X_1 + .36X_2 + .21X_3$ is not a mixture random variable. It is a weighted average of the three $X_i.$ While $E(S)$ can be found as $E(X) = .43E(X_1) + .36E(X_2) + .21E(X_3),$ you cannot find $Var(S)$ without knowing something about the joint distribution of the $X_i$. (It would be easiest if the $X_i$ are independent.)


At this point, I don't know whether your real problem refers to the mixture distribution of $Y$ or to the weighted average distribution of $S.$ Below are some approximate results for each distribution based on simulation. One point of this is to show you how very different the distributions of $Y$ and $S$ are. Another is to provide you with roughly approximate answers with which you might compare your answers when you decide which problem you are working.

Mixture. Below is a density histogram that shows the approximate distribution of $Y,$ which takes only five values, has $E(Y) \approx 1.18$ and $SD(Y) \approx 0.83.$ The height of each histogram bar suggests the value of one of the probabilities $P(Y = y),$ for $y = 0, 1, 2, 3, 4.$

enter image description here

Weighted average. By contrast, below is a density histogram of the approximate distribution of $S.$ In my simulation, this random variable took about 60 distinct values. Each histogram bar represents several possible values of the random variable $S$. Also, $E(S) = E(Y) \approx 1.18.$ However, assuming independence of the $X_i,$ we have $SD(S) \approx 0.48 \ne SD(Y).$ The random variable $S$ is only very roughly normal; the density curve of $\mathsf{Norm}(E(S), SD(S))$ is superimposed on the histogram. [I would not want to rely on the normal distribution to get accurate values for the distribution of $S.$]

enter image description here

0
On

As others have stated, your problem is not very clearly expressed, and my answer might not answer the problem you thought you wanted to ask.

Assuming you are interested in a random variable $S$ whose distribution is the $(.43, .36, .21)$ mixture of the three indicated binomial distributions, with $$PS\in A)= .43P(X_1\in A) + .36P(X_2\in A) + .21P(X_3\in A)$$ for any set $A$, you can work out the exact value of $P(S\le2)$ by working out $P(X_i\le2)$ for each $i$ (the case $i=1$ is trivial, and the others not difficult), multiply the probabilities by the coefficients, and add.

You also have two methods of approximating $P(S\le2),$ namely, mix the normal approximations, or use a single normal approximation as in your problem statement. It is not clear to me which would be more accurate, nor whether either is cheaper than figuring out your answer exactly.