Probability of drawing at least 4 white marbles out of 2 bags when taking only 3 marbles from each bag.

138 Views Asked by At

I'm trying to see if there's a formula that can deal with this problem in the general case. Since I don't know, I'm gonna try and give some specifics to the situation to try and more accurately explain it. The concrete numbers about the white and black marbles I'm about to give aren't really important. The more important numbers (if any matter for the general case) are the number of marbles drawn in total, the number of marbles drawn from each bag, and the "at least" number needed to be successful.

Given:

Bag 1 contains 5 white marbles and 6 black marbles

Bag 2 contains 9 white marbles and 42 black marbles

Then perform the following...

3 marbles are chosen from bag-1 and 3 marbles are chosen from bag-2. What is the probability that at least 4 of the total 6 drawn marbles are white?

I believe the more long form answer is the probability of...

at least 3 white marbles from bag-1 AND at least 1 white marbles from bag-2
OR
at least 2 white marbles from bag-1 AND at least 2 white marbles from bag-2
OR
at least 1 white marbles from bag-1 AND at least 3 white marbles from bag-2

Assuming these are drawn without replacement (really irrelevant, so let's just say this is true), then the individual bag draws can be represented by the cumulative hypergeometric distribution, so we'd get...

  (hypergeometric(X >= 3) * hypergeometric(X >= 1)) 
+ (hypergeometric(X >= 2) * hypergeometric(X >= 2)) 
+ (hypergeometric(X >= 1) * hypergeometric(X >= 3))

The overall question is, is there some shorter formula that would encompass all of that? What if I wanted to know the probability of drawing at least 3 white marbles in total instead of 4? Manually adjusting the number of hypergeometric distributions to use based on each specific case seems tedious if the cases are large and varied.

1

There are 1 best solutions below

5
On BEST ANSWER

Be careful how you count. Consider the possibility that you get $2$ white marbles from bag $1$ and $3$ white marbles from bag $2.$ That's one of the outcomes counted by "at least $2$ white marbles from bag $1$ and at least $2$ white marbles from bag $2$." It's also one of the outcomes counted by "at least $1$ white marble from bag $1$ and at least $3$ white marbles from bag $2$."

So your formula would count that event twice, and you would get a result greater than the true answer.

In general, let's suppose you have random variables $X_1$ and $X_2$ that take non-negative integer values (like the number of white marbles drawn from a bag); suppose you know how to compute $P(X_1 = k)$ and $P(X_2 = k)$ (or equivalently, how to compute $P(X_1 \geq k)$ and $P(X_2 \geq k)$) for any $k$; and suppose you want to calculate $P(X_1 + X_2 \geq n)$. Then one formula is $$ P(X_1 + X_2 \geq n) = \sum_{k=0}^r P(X_1 = k) P(X_2 \geq n - k), $$ where $r$ is the number of marbles you draw from the first bag.

This has two features that make it both more compact and more correct than what you wrote:

  • It uses summation notation. This lets you summarize a lot of terms succinctly.
  • It uses "$=$" to describe one of the events instead of "$\geq$" for both events. That's how it avoids double-counting the probabilities.

In your formula, $n= 4,$ which would give you five terms, but since $P(X_1 = 4) = P(X_2 = 4) = 0,$ you do not have to compute terms for $k=0$ or $k=4.$ You actually end up computing more terms if $n = 3.$

Note that writing the probability on one line using summation notation does not help you much when it comes time to actually compute the probability numerically. You will still be doing a lot of calculations.