I have just exposed myself to the geometric distribution, and have five fabricated questions based on the following information: You are collecting cats from baskets. A basket contains a cat with probability $0.1$.
1) What is the probability that the first $25$ baskets that you check have no cats?
My logic: Probability that the first basket is empty is $0.9$, 2nd ${0.9}^2$, 25th ${0.9}^{25} \approx 0.718$
2) What is the probability that the $25^{th}$ basket is the first to contain a cat?
My logic: This is the geometric distribution which has equation $(1-p)^{n} \cdot p$, where we have $n = 25, p = 0.1$: $(1-0.1)^{25} \cdot 0.1 \approx 7.18E{-3}$
3) What is the expected number of baskets opened until a cat is found?
My logic: Geometric distribution equation works again: $E(X) = \frac{1-p}{p} = \frac{1-.1}{.1} = 9$. Which is the number of trials expected before a success, so 9-1 ratio.
4) What is the variance of the number of baskets opened until a cat is found?
My logic: Once again a geometric distribution equations should hold: $\frac{1-p}{p^2} = \frac{1-0.1}{0.1^2} = 90$
5) Now suppose that you need to find three cats(to fulfil your happiness), what is the expected value and variance of the number of baskets that will need to be checked? Assuming the cats are distinct and that each basket has the same probability of 0.1 of containing any one of the three needed cats.
My logic: The expected value is just three times greater, so 9? I have no idea what to do no the variance side of things though.
Does everything seem correct? How do I go about question 5? Any help is greatly appreciated!
When the number of cats you need to find is greater than 1, the probability distribution for the number of baskets you must check becomes what is called negative binomial. The negative binomial distribution has a number of different forms, but the one that is applicable in your case is the following. Suppose we are interested in the random number $X$ of the total number of baskets checked until we observe the $r^{\rm th}$ cat (in your case, $r = 3$). Thus $X$ includes those baskets with cats. Then the probability that $X = k$ is given by $$\Pr[X = k] = \binom{k-1}{k-r}p^r (1-p)^{k-r}, \quad k = r, r+1, r+2, \ldots,$$ where $p$ is the individual probability of observing a cat (here, $p = 0.1$). To understand why this is the case, consider the situation where $X = k$ for some fixed $k \ge r$. Then among these $k$ baskets checked, the final basket must always contain a cat (otherwise, we would not have stopped checking). Among the remaining $k-1$ baskets, there are $\binom{k-1}{r-1}$ distinct ways that we could have observed the ordering of the remaining $r-1$ cats. Each of the $r$ baskets containing cats has a probability of $p$ of being observed, and the $k-r$ baskets had a probability of $1-p$ of being observed.
The expected value and variance of such a distribution are given by $${\rm E}[X] = \frac{r}{p}, \quad {\rm Var}[X] = \frac{r(1-p)}{p^2}.$$ One method of proof involves some algebra, with the assumption (which is intuitively true but does require formal proof) that $$\sum_{k=r}^\infty \Pr[X = k] = \sum_{k=r}^\infty \binom{k-1}{r-1} p^r (1-p)^{k-r} = 1.$$ We have $$\begin{align*} {\rm E}[X] &= \sum_{k=r}^\infty k \Pr[X = k] \\ &= \sum_{k=r}^\infty k \frac{(k-1)!}{(k-r)!(r-1)!} p^r (1-p)^{k-r} \\ &= \sum_{k=r}^\infty \frac{r}{p} \cdot \frac{k!}{(k-r)! r!} p^{r+1} (1-p)^{k-r} \\ &= \frac{r}{p} \sum_{k=r}^\infty \binom{k+1-1}{k+1-(r+1)} p^{r+1} (1-p)^{(k+1)-(r+1)} \\ &= \frac{r}{p} \sum_{k=r+1}^\infty \binom{k-1}{k-(r+1)} p^{r+1} (1-p)^{k-(r+1)} \\ &= \frac{r}{p}, \end{align*}$$ where the last equality is true because the resulting sum is the sum over a negative binomial distribution with $r+1$ instead of $r$ (which is still $1$). Similarly, we can show using the same method that $${\rm E}[X(X+1)] = \sum_{k=r}^\infty k(k+1) \Pr[X = k] = \frac{r(r+1)}{p^2},$$ and the variance is obtained from $${\rm Var}[X] = {\rm E}[X^2] - {\rm E}[X]^2 = {\rm E}[X(X+1)] - {\rm E}[X] - {\rm E}[X]^2.$$ Of course, this is more algebraically involved than the other answer, but I just wanted to provide some additional background and mention the properties of this distribution, which we can see is a useful generalization of the geometric distribution.