How to differ between Binomial and Hypergeometric Distributions while solving problems?

23 Views Asked by At

I'm listing 3 questions: 1.Suppose that a batch of 100 items contains 6 that are defective and 94 that are not defective. If X is the number of defective items in a randomly drawn sample of 10 items from the batch, find (a) P{X = 0} and (b) P{X > 2}.

  1. Suppose that a class of 50 students has appeared for a test Forty one students have passed this test while the remaining 9 students have failed. Find the probability that in a group of 10 students selected at random (Give your answer correct to 4 decimal places.) a. none have failed the test b. at least 3 students have failed the test

  2. 10% light bulbs are defective in a box of 50 light bulbs, find probability that 5 defective bulbs will be selected when 10 bulbs are selected from the 50 bulbs?

First two problems are solved everywhere using Binomial distribution and the last one is solved using hypergeometric distribution but I cannot find any difference between them; in all cases we are taking sample from a population and calculating answers, shouldn't the first two questions also be hypergeometric distributions?

1

There are 1 best solutions below

0
On BEST ANSWER

The binomial model is used if the sampling occurs with replacement, or if the size of the population is so large that sampling without replacement is effectively equivalent to sampling with replacement. Otherwise, a hypergeometric model is more appropriate for counting the number of successes in a random sample.

With this in mind, it is my opinion that all three questions should employ the hypergeometric distribution.

For the first question, the probability that no items are defective in the sample is $$\Pr[X = 0] = \frac{\binom{6}{0} \binom{94}{10}}{\binom{100}{10}} \approx 0.522305.$$ If we use a binomial model, it would be calculated as $$\Pr[X = 0] = \binom{10}{0} (0.06)^0 (0.94)^{10} \approx 0.538615.$$ They are not equivalent. Moreover, when calculating $\Pr[X > 2]$, in both cases you must restrict $X \le 6$, because there are only $6$ defects in the population. So for instance, if using a binomial model, you cannot write $$\Pr[X > 2] \overset{?}{=} 1 - \Pr[X \le 2] = 1 - \sum_{x=0}^2 \binom{10}{x} (0.06)^x (0.94)^{10-x}.$$ Whereas, with the hypergeometric model, this is already taken into account, so this expression is correct: $$\Pr[X > 2] = 1 - \Pr[X \le 2] = 1 - \sum_{x=0}^2 \frac{\binom{6}{x} \binom{94}{10-x}}{\binom{100}{10}}.$$ That said, even if you use the binomial model to calculate $\Pr[X > 2] = \sum_{x=3}^6 \Pr[X = x]$, this still leads to overestimation of the actual probability due to the failure to correct for sampling without replacement.

The same considerations apply to the other two problems. All of them should be calculated using a hypergeometric model because in none of the cases is the sampling performed with replacement, and in all cases, the population size is not well-approximated by infinity. Particularly egregious is the assumption in the first two questions that a binomial model should include outcomes that cannot possibly occur; i.e., $X > 6$ in the first question when there are only $6$ defects in the population, and $X > 9$ in the second question when there were only $9$ students who failed. A binomial model can be a suitable approximation for a hypergeometric model, but the goodness of such an approximation depends on the parameters.