Number of draws before you see all candies?

168 Views Asked by At

Recently, I thought of the following question: Suppose there are 5 candies in a bag - you choose two candies, and then put these two candies back in the bag (assume each candy has an equal probability of being selected). On average, how many times do you need to choose candies before you are guaranteed to have seen every candy at least once?

In a way, this problem kind of reminds me of the "Coupon Collector Problem" (https://en.wikipedia.org/wiki/Coupon_collector%27s_problem), but I am not sure how to solve this problem using the Coupon Collector framework. I thought of framing this problem as a Markov Chain:

  • State 2 : You have observed 2 unique candies
  • State 3: You have observed 3 unique candies
  • State 4: You have observed 4 unique candies
  • State 5: You have observed 5 unique candies (Absorbing State)

It took me a long time, but I think I was able to create a Transition Matrix for this problem :

 A = matrix(
   c(0.1, 0.6, 0.3, 0, 0,0.3, 0.6, 0.1, 0,0, 0.6, 0.4, 0,0,0, 1), # the data elements
   nrow=4,              # number of rows
   ncol=4,              # number of columns
  byrow = TRUE) 



     [,1] [,2] [,3] [,4]
[1,]  0.1  0.6  0.3  0.0
[2,]  0.0  0.3  0.6  0.1
[3,]  0.0  0.0  0.6  0.4
[4,]  0.0  0.0  0.0  1.0

From here, I suppose I could use the Theory of Markov Chains and find out the expected number of transitions until you reach the Absorbing State - but it was quite difficult to correctly calculate the transition probabilities. I imagine that once the number of states (i.e. "candies") increase, it will become very difficult to calculate all these transition probabilities.

I was hoping for an easier way which would directly allow you to calculate the expected number of draws needed to observe "M" candies (at least once) with "N" draws and each draw of size "K" (e.g. M = 5, K = 2, N = ?) - provided you are given the probability of selecting any given candy (e.g. suppose the candies did not have equal probabilities of being selected).

Can someone please suggest another way of solving this problem?

Thanks!

  • "Food" for Thought: Suppose there were "M" candies" and you draw "K" candies "N" number of times. Suppose this time, you don't know the true value of "M" and you only have information on "K" and "N" - is there a way to estimate "M" based on the data you collect from "K" and "N"?
2

There are 2 best solutions below

1
On BEST ANSWER

The transition probabilities come from the Hypergeometric distribution. You have population size $M$, you draw $K$ of these without replacement. When you've seen $i$ candies (that is you're in state $i$), the number of successes (unseen candies) in population is $M-i$ and the probability of observing $k$ new candies is

$$\frac{{M-i\choose k}{i\choose K-k}}{M \choose K}$$

For $M=5$ you get (I include the states 0 and 1)

$$ \displaystyle \frac{1}{\binom{5}{K}} \left(\begin{array}{rrrrrr} \binom{0}{K} & 5 \, \binom{0}{K - 1} & 10 \, \binom{0}{K - 2} & 10 \, \binom{0}{K - 3} & 5 \, \binom{0}{K - 4} & \binom{0}{K - 5} \\ 0 & \binom{1}{K} & 4 \, \binom{1}{K - 1} & 6 \, \binom{1}{K - 2} & 4 \, \binom{1}{K - 3} & \binom{1}{K - 4} \\ 0 & 0 & \binom{2}{K} & 3 \, \binom{2}{K - 1} & 3 \, \binom{2}{K - 2} & \binom{2}{K - 3} \\ 0 & 0 & 0 & \binom{3}{K} & 2 \, \binom{3}{K - 1} & \binom{3}{K - 2} \\ 0 & 0 & 0 & 0 & \binom{4}{K} & \binom{4}{K - 1} \\ 0 & 0 & 0 & 0 & 0 & \binom{5}{K} \end{array}\right) $$

which agrees with your result if you plug in $K=2$:

$$ \displaystyle \left(\begin{array}{rrrrrr} 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & \frac{2}{5} & \frac{3}{5} & 0 & 0 \\ 0 & 0 & \frac{1}{10} & \frac{3}{5} & \frac{3}{10} & 0 \\ 0 & 0 & 0 & \frac{3}{10} & \frac{3}{5} & \frac{1}{10} \\ 0 & 0 & 0 & 0 & \frac{3}{5} & \frac{2}{5} \\ 0 & 0 & 0 & 0 & 0 & 1 \end{array}\right) $$

Here's a Sage-code I used (I keep $K$ as variable, but the formula for the expected value of number of transitions before being absorbed becomes cumbersome, so there I plug in a value for $K$):

#M candies, draw K each time
def getTransitionMatrix(M):
    R.<K> = PolynomialRing(QQ, 1)
    a = [[0]*(M+1) for _ in range(M+1)]
    for i in range(M+1):
        for k in range(M-i+1):
            a[i][i+k] = binomial(M-i, k)*binomial(i, K-k)
    #show(1/binomial(M, K), matrix(a))
    return 1/binomial(M, K) * matrix(a)


def calcE(M, KVal):
    P = getTransitionMatrix(M).subs(K=KVal)
    Q = P[0:M, 0:M] #last one is the absorbing state
    fundMat = (matrix.identity(M)-Q)^(-1)
    #show(fundMat)
    return sum(fundMat[0])

    
print (calcE(5, 2))

#P = getTransitionMatrix(5)
#show(P.subs(K=2))
1
On

This response assumes that the problem to be solved is the expected number of draws before each of the $(5)$ candies is seen at least once.

Alternative approach, that would be forced on me, if I had to attack the problem. This is because I am totally ignorant of Markov chains.

I will capitalize on the fact that $5$ is such a small number, and that the number of candies viewed each time is only one more than $(1)$. Therefore, it is not that onerous to re-invent the wheel and employ the same ideas that (I would speculate) provide the foundation of Markov Chains.

However, I must admit that I am probably destroying the educational value that the problem composer intended, by this answer. On the other hand, you may still be able to use the following analysis as a guide to walking down the actual path intended by the problem composer.


For $~r \in \{1,2,3\}$, let $E(r)$ denote the expected number of additional draws needed, under the assumption that there are still $r$ candies unseen.

Let $T$ denote the expected total number of draws.

After $1$ draw, you are guaranteed that exactly $3$ of the $5$ candies are still unseen. Therefore, the desired overall computation is:

$$T = 1 + E(3). $$

I will work backwards, first computing $E(1)$, then computing $E(2)$, and then computing $E(3)$.


Assume that $(4)$ of the candies have been seen, and that $(1)$ candy has been unseen.

On the next draw, the probability that the unseen candy will be seen is $~\displaystyle \frac{2}{5}$.

Therefore,

$$E(1) = \left\{1 \times \frac{2}{5}\right\} + \left\{\left[1 + E(1)\right] \times \frac{3}{5}\right\} = 1 + \left[E(1) \times \frac{3}{5}\right] \implies $$

$$\frac{2}{5} \times E(1) = 1 \implies E(1) = \frac{5}{2}.$$


If $(2)$ candies are unseen, then, the probabilities are:

  • Next draw sees both unseen candies : $~\displaystyle \frac{\binom{2}{2}}{\binom{5}{2}} = \frac{1}{10}.$

  • Next draw sees neither unseen candies : $~\displaystyle \frac{\binom{3}{2}}{\binom{5}{2}} = \frac{3}{10}.$

  • Next draw sees exactly $(1)$ of the unseen candies : $~\displaystyle 1 - \left[\frac{1}{10} + \frac{3}{10}\right] = \frac{3}{5}.$

Therefore,

$$E(2) = \left\{1 \times \frac{1}{10}\right\} + \left\{\left[1 + E(1)\right] \times \frac{3}{5}\right\} + \left\{\left[1 + E(2)\right] \times \frac{3}{10}\right\}$$

$$= 1 + \left\{\frac{5}{2} \times \frac{3}{5}\right\} + \left\{E(2) \times \frac{3}{10}\right\}\implies $$

$$\frac{7}{10} \times E(2) = 1 + \frac{3}{2} = \frac{5}{2} \implies $$

$$E(2) = \frac{10}{7} \times \frac{5}{2} = \frac{25}{7}.$$


The computation for $E(3)$ will parallel the computation for $E(2).$

If $(3)$ candies are unseen, then, the probabilities are:

  • Next draw sees two of the unseen candies : $~\displaystyle \frac{\binom{3}{2}}{\binom{5}{2}} = \frac{3}{10}.$

  • Next draw sees none of the unseen candies : $~\displaystyle \frac{\binom{2}{2}}{\binom{5}{2}} = \frac{1}{10}.$

  • Next draw sees exactly $(1)$ of the unseen candies : $~\displaystyle 1 - \left[\frac{1}{10} + \frac{3}{10}\right] = \frac{3}{5}.$

Therefore,

$$E(3) = \left\{\left[1 + E(1)\right] \times \frac{3}{10}\right\} + \left\{\left[1 + E(2)\right] \times \frac{3}{5}\right\} + \left\{\left[1 + E(3)\right] \times \frac{1}{10}\right\}$$

$$= 1 + \left\{\frac{5}{2} \times \frac{3}{10}\right\} + \left\{\frac{25}{7} \times \frac{3}{5}\right\} + \left\{E(3) \times \frac{1}{10}\right\}\implies $$

$$\frac{9}{10} \times E(3) = 1 + \frac{3}{4} + \frac{15}{7} = \frac{109}{28} \implies $$

$$E(3) = \frac{10}{9} \times \frac{109}{28} = \frac{545}{126}.$$

Therefore,

$$T = 1 + E(3) = 1 + \frac{545}{126} = \frac{671}{126}.$$