probability of rolling repeated numbers

109 Views Asked by At

if you roll an x-sided die, n-number of times, what are the chances of getting r-number of repeated values?

if I roll a twenty-sided die, eighty times, what are the chances that I'll roll a six twice and a sixteen twice? ... but NOT specifically six and sixteen; just any value/s repeated, two or more times?

from, say, eight rolls of a twenty-sided die, I get this: {10,12,8,10,2,5,3,5}

what were the chances of getting those two tens, and two fives, in that many rolls?

and the ACTUAL question I have is: what is the calculation to figure out how many 'repeated values' you should expect in n rolls of an x-sided die?

so far, if I do this:

1 - r * (product from i=0,(n-1) of (x-i)/x)

given the example of rolling two repeats - that is, two fives and two tens - from eight throws of a twenty-sided die, like this {10,12,8,10,2,5,3,5} I would put this in wolfram

1 - 2 * ( product from i=0,7 of (20-i)/20 )

https://www.wolframalpha.com/input?i=1+-+2+*+%28+product+from+i%3D0%2C7+of+%2820-i%29%2F20+%29

and get 'basically 60%'

but how do I figure out the chances of any value repeating two or more times, specifically t-number of times?

1

There are 1 best solutions below

0
On

Use combinations and permutations

One option is to follow the patterns in this blog post (not mine). An excerpt:

Triple and two pair (3x,2x,2x). Ten ways to select the height of the triple, the dice for which can be chosen C(7,3) ways. There are C(9,2) ways to select the heights of the pairs. The first pair can be selected C(4,2) ways, the second C(2,2). The remaining three die can come up 7 different ways without matching the other sets (note that this is P(7,1)). 10 × C(7,3) × C(9,2) × C(4,2) × C(2,2). For more than seven dice, there would dice left over that could not match the value of any of the sets or each other, so permutations of the seven remaining possible values would be used: 10 × C(n,3) × C(9,2) × C(n−3,2) × C(n−5,2) × P(7,n−7).

(Note that this example uses d10s rather than d20s, but the general principles are there.) It should be possible to derive an algorithm to determine these expressions; however, I have not done so. Rather, I prefer to...

Use a more general-purpose algorithm

I've created a more general-purpose algorithm as part of my Icepool Python library that can solve many dice pool problems; while a bespoke solution as above is probably more efficient, mine is good enough for pool sizes that you're likely to physically roll. You can read a crude explanation of the algorithm here. In short, as long as we can express the pool evaluation as a transition function over (outcome, count) pairs without too large a state space, it's possible to compute the solution reasonably efficiently.

import icepool

class AllMatchingSets(icepool.EvalPool):
    def next_state(self, state, outcome, count):
        if state is None:
            state = ()
        # If at least a pair, append the size of the matching set.
        if count >= 2:
            state += (count,)
        # Prioritize larger sets.
        return tuple(sorted(state, reverse=True))

all_matching_sets = AllMatchingSets()

# Evaluate on 10d20.
print(all_matching_sets.eval(icepool.Pool(icepool.d20, 10)))

You can try it in your browser using this JupyterLite notebook. Just change the d10s to d20s.

Here's an example output for 8d20:

Denominator: 25600000000

Outcome Weight Probability
() 5079110400 19.840275%
(2,) 10939622400 42.732900%
(2, 2) 5860512000 22.892625%
(2, 2, 2) 781401600 3.052350%
(2, 2, 2, 2) 12209400 0.047693%
(3,) 1562803200 6.104700%
(3, 2) 1041868800 4.069800%
(3, 2, 2) 97675200 0.381544%
(3, 3) 32558400 0.127181%
(3, 3, 2) 1915200 0.007481%
(4,) 130233600 0.508725%
(4, 2) 48837600 0.190772%
(4, 2, 2) 1436400 0.005611%
(4, 3) 1915200 0.007481%
(4, 4) 13300 0.000052%
(5,) 6511680 0.025436%
(5, 2) 1149120 0.004489%
(5, 3) 21280 0.000083%
(6,) 191520 0.000748%
(6, 2) 10640 0.000042%
(7,) 3040 0.000012%
(8,) 20 0.000000%

Improving efficiency for more specific queries

If you have a more specific query than enumerating all possible sets of matching sets, you can reduce the state space and improve efficiency by only retaining enough information to compute the answer. For example, if you just want to know the number of pairs:

class NumPairs(icepool.EvalPool):
    def next_state(self, state, outcome, count):
        return (state or 0) + (count // 2)

num_pairs = NumPairs()

# Evaluate on 80d20.
print(num_pairs.eval(icepool.Pool(icepool.d20, 80)))
Outcome Probability
30 0.000190%
31 0.036113%
32 0.921689%
33 7.379965%
34 24.005873%
35 35.239404%
36 24.047867%
37 7.405807%
38 0.926534%
39 0.036366%
40 0.000192%

(This counts e.g. a quadruple as two pairs---if you want unique pairs just replace // with >=.)