If I roll, say, 20 dice, what is the probability that at least 5 of them will be the same?
Specifically, I am not asking for the probability of e.g. rolling 5 sixes out of 20 dice. For that I believe I could use the binomial distribution and arrive at ~12.9%
I have made a Monte Carlo simulation using Python, where I rolled 20 dice a million times. From each iteration (1 iteration = rolling 20 times), I took the highest number of occurrences of the same number, ignoring what number it was. Then I calculated the number of times each number of occurrences was the highest. Then I calculated cumulative probability of each max number of occurrences. From my simulation, I arrived at ~92.8% probability that at least 5 of 20 rolls are the same.
I would love to see how this could be calculated using a specific formula, similar to the binomial distribution, so that I could reproduce it and be able to calculate e.g. probability of having at least 10 the same out of 30 etc.
Many thanks in advance for your advice!
Such problems can be solved with generating functions, but it's best to have a computer algebra system around to do the heavy work. Readers interested in learning about generating functions can find many resources in the answers to this question: How can I learn about generating functions?
Consider the complementary problem: What is the probability that a die is rolled $20$ times and no face appears more than $4$ times? The exponential generating function for the probability that a die is rolled $n$ times and no face appears more than $4$ times is $$f(x) = \left(1 + \frac{x}{6} + \frac{1}{2!} \left( \frac{x}{6} \right)^2 + \frac{1}{3!} \left( \frac{x}{6} \right)^3 + \frac{1}{4!} \left( \frac{x}{6} \right)^4 \right)^6$$ The probability we want is $20!\; [x^{20}]f(x)$, where $[x^{20}]f(x)$ is the coefficient of $x^{20}$ when $f(x)$ is expanded. This is where a computer algebra system is handy. (I used Mathematica.) The result is $$ 20!\; [x^{20}]f(x) = \frac{151355579375}{2115832430592} = 0.0715348$$ So the answer to the original problem, the probability that at least one face appears $5$ or more times, is $1 - 0.0715348 = 0.928465$.