Calculating the probability of one approach outperforming another approach

70 Views Asked by At

I'm trying to calculate a probability for some outcome, but am running into some issues. I did take a math class that involved probability calculations in high school but it's been some time and I can't bring myself to remember it.

The situation

I have 26 options. Each option can be chosen multiple times. I have two methods for collecting all unique options at least once.

Method 1 takes 6 minutes to choose an option, but does not guarantee a unique result (i.e: it can choose the same option three times in a row)

Method 2 takes 15 minutes to choose an option, but it does guarantee that the chosen option has not been chosen before.

It was quite easy to figure out that method 2 will take 390 minutes to complete, having chosen all options once. Method 1 will take 156 minutes if it happens to choose the correct option each time, but I've calculated that the probability of this would be $6.55\cdot10^{-11}$. Not a very high probability of course.

I want to calculate the probability that method 1 will outperform method 2. So, what is the probability that it completes in less than 390 minutes (i.e: 65 attempts)?

The probability of picking a new option changes depending on how many unique choices were already made, and I have no clue how to calculate this. I figure that we should start with the original probability of exactly 26 unique choices. However, I'm not sure how to continue with the remaining 39 choices.

The Question

How can I calculate the probability of method 1 completing before method 2? Perhaps it would also be interesting to know what distribution of runtimes there are (I guess this will be some normal distribution). For example, What is the maximum runtime for 50% of the runs of method 1?

Thanks for your time. I hope that I managed to explain the question in enough detail.

1

There are 1 best solutions below

1
On BEST ANSWER

As Henry says in the comments, this is a version of the coupon collector's problem, which is a bit more difficult than the sort of problem I imagine you'd run into in high school. The rough solution to the problem tells you that if you want to collect $26$ options then you need to choose about $26 \ln 26 \approx 84.7$ times on average. So if each choice takes $6$ minutes we'll need about

$$(26 \ln 26) \times 6 \approx 508$$

minutes. This quick and dirty estimate strongly suggests but does not quite prove that Method 2 is very likely to be faster. The argument that gives this estimate actually says more generally that the expected number of choices needed to get the first option is $\frac{26}{26}$, the second option is $\frac{26}{25}$, and generally the $k^{th}$ option is $\frac{26}{26 - k + 1}$; each option keeps taking longer to wait for because we need to keep getting luckier, and in particular the expected number of choices needed to get the final option we're missing is $26$. Since $84.7 - 26$ is less than $65$ we actually get a funny and sort of sad conclusion: after $65$ attempts we're likely to have gotten almost every option and still be waiting on the very last one! The last one is the worst.


To prove that Method 1 is usually worse we can argue as follows. The probability that we get all $26$ options after $65$ choices can be calculated exactly using the principle of inclusion-exclusion, as follows: first we start with $1$, then we subtract $26$ times the probability that we miss any one particular option which is $\left( 1 - \frac{1}{26} \right)^{65}$, then because we've double-counted cases where we miss more than one option we add back ${26 \choose 2}$ times the probability that we miss any two particular options which is $\left( 1 - \frac{2}{26} \right)^{65}$, and it keeps going like this:

$$\mathbb{P} = \sum_{k=0}^{26} (-1)^k {26 \choose k} \left( 1 - \frac{k}{26} \right)^{65}.$$

Calculating this sum exactly is annoying but doable with computer / calculator assistance. To save time we can use the Bonferroni inequalities to get us an upper bound; this tells us that if we truncate the sum above to a positive term then we get an upper bound on the true probability and if we truncate to a negative term then we get a lower bound, so for example we get that

$$\mathbb{P} \le 1 - 26 \left( \frac{25}{26} \right)^{65} + {26 \choose 2} \left( \frac{24}{26} \right)^{65} \approx 0.756 \dots$$

which is not a very good bound yet (we're looking for a bound less than $0.5$); taking more terms gives

$$\mathbb{P} \le 1 - 26 \left( \frac{25}{26} \right)^{65} + {26 \choose 2} \left( \frac{24}{26} \right)^{65} - {26 \choose 3} \left( \frac{23}{26} \right)^{65} + {26 \choose 4} \left( \frac{22}{26} \right)^{65} \approx 0.145 \dots $$

so now we know that there's less than a $14.5\%$ chance that Method 1 is faster (or ties). The true probability can be calculated using e.g. WolframAlpha to be $\boxed{ 0.0912 \dots }$ so in fact there's about a $9.12 \%$ chance that Method 1 is faster (or ties).


Also, you didn't ask this but bonus calculation: if we happen to be able to switch between Method 1 and Method 2 then the optimal strategy, in the sense of the strategy that minimizes the expected time needed, is to use Method 1 until the expected amount of time needed to get the next option via Method 2 is better, then switch to Method 2. Using the above numbers we can calculate that this happens when $\frac{26}{26 - k + 1} \times 6 = 15$, which gives $k = 16.6$; so we use Method 1 to collect $16$ options then Method 2 to collect the remaining $10$. The expected time this takes is

$$6 \left( \sum_{k=1}^{16} \frac{26}{26 - k + 1} \right) + 15 \times 10 \approx \boxed{ 294 }$$

minutes.