How to calculate this probability? Correct & expensive vs. Aproximate? & easy

48 Views Asked by At
  • X is a random variable that takes its value from the set of natural numbers [1, 80].
  • We also have two sets Set 1 and Set 2 (without repetitions) of 11 elements each and without elements in common. Those elements also come from the set of natural numbers [1, 80]. Eg., Set 1 = {01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 80} and Set 2 = {11, 12, 22, 32, 42, 52, 62, 72, 77, 78, 79}.
  • X is assigned a value twenty times without replacement. The twenty obtained values constitute one result.

Find P(A∩B) given

  • A: Obtaining exactly zero values from Set 1.
  • B: Obtaining exactly three values from Set 2.

I see two approaches:

1st approach: With hypergeometric distribution

P(A∩B) = P(A) * P(B|A)

For P(A):

Successes of sample = a = 0; Sample size = 20; Successes of lot = 11; Lot size = 80.

P(A) = 0.03270

For P(B|A):

Successes of sample = 3; Sample size = 20 - a = 20; Successes of lot = 11; Lot size = 80 - (11 - a) = 80 - 11 = 69.

P(B|A) = 0.28189

Then, P(A∩B) = P(A) * P(B|A) = 0.03270 * 0.28189 = 0.00921

2nd approach: With probability tree

Nevertheless, since I wasn't sure about the previous method, I built a probability tree for a simpler experiment with only three assignments of X. Each node gives three branches, one means obtaining one element from Set 1, another means obtaining one element from Set 2, and the last one means obtaining one element outside both sets. I calculated all the conditioned probabilties. Then I calculated the joint probability of each branch and finally, since we are not interested in the order of the results, I did some sums in order to make groups (P(0 Set 1 ∪ 0 Set 2), P(0 Set 1 ∪ 1 Set 2), P(0 Set 1 ∪ 2 Set 2), P(0 Set 1 ∪ 3 Set 2), P(1 Set 1 ∪ 0 Set 2), P(1 Set 1 ∪ 1 Set 2)...). After doing all that I found that my results didn't match those obtained from the 1st method for the same simplified experiment, due to a slight but significant difference that cannot be attibuted to precision issues. That's why I conclude the two methods are not equivalent.

My question is: Since we "cannot" apply the 2nd method due to complexity issues (there's at least 3^20 values to calculate. I know you can still make some optimizations, because there's patterns in the conditioned probabilities), which method should we apply?

Thank you very much in advance.