Distribution of the size of the largest set(s) of faces in roll of multiple dice?

224 Views Asked by At

Over the holidays I played a dice game with the family (a gift one got - don't recall the name) that I'll generalize:

Each player has a set of $D$ identical dice with face values $1$ to $F$. The players all roll all their dice, and whatever face shows up the most in their hand, they put all dice with that face aside and roll again. Any dice in the new roll matching the face value of those put aside is added to the aside pile, and the process is repeated until all of some player's dice have been put aside.

For example, say my first roll (using 10 regular dice here) was $\{1,2,1,1,5,6,3,2,4,3\}$: I'd put the three one's aside. If my next roll was $\{5,4,6,3,2,5,1\}$, I'd add the single one to the aside pile, and continue.

First player to get all dice aside wins.

To the question:

On the flight home, I thought about the distribution of $S$, the size of the largest set of same faces in the first roll. In the above example, it was $3$. Had the roll been say $\{1,2,3,1,4,1,5,5,5,4\}$ it would also be $3$ (I can pick either the three ones or the three fives).

For sizes $S>D/2$, it's simply the probability of the binomial trial times $F$, e.g. for $8$ regular six-sided dice the probability of the maximum set size in the first roll being say $5$ is just $6 *C(8, 5) (1/6)^5 (1 - 1/6)^3 = 875/34992$

I'm stuck on the cases for $S\le D/2$ - I think this will require inclusion-exclusion, but I've had no success so far.

Edit: I was able to get what I'm after using ideas from the paper "The exact distribution of the maximum, minimum and the range of Multinomial/Dirichlet and Multivariate Hypergeometric frequencies" by Corrado.

Using the ideas within, I am able to get exact PMF for 100's of dice of 20+ faces quite quickly.

2

There are 2 best solutions below

2
On

Simulation. I don't have anything to contribute to a combinatorial discussion that leads to an analytic solution. But here is a simulation in R statistical software that counts values of $S$ in repeated experiments, each with $D$ independent rolls of a die with $F$ faces. I begin with a demonstration of the simulation method, and then give results of many experiments to simulate the distribution of $S.$ Knowing approximately the correct results may be useful to assess the accuracy of an analytic solution.

Demonstration of concept. The R function sample can be used to simulate rolling a die, and the function rle (for 'run length encoding') can be used as the basis of a method to find $S$ for each experiment. Here are results of experiment yielding various values of $S.$

f=6; d=10;  die=1:f
roll = sample(die, d, repl=T)
srt.roll = sort(roll); srt.roll
## 1 1 2 2 3 3 4 4 5 6
len = rle(srt.roll)$lengths;  len
## 2 2 2 2 1 1       # 'run lengths' of sorted rolls
s = max(len); s
## 2                 # s = 2 (There two 1's, two 2's, etc.)

...
## 1 1 2 2 2 3 4 5 5 6
len = rle(srt.roll)$lengths;  len
## 2 3 1 1 2 1
s = max(len); s
## 3                 # There were three 2's

...
## 2 4 4 4 4 4 5 5 6 6
len = rle(srt.roll)$lengths;  len
## 1 5 2 2
s = max(len); s
## 5                # there were five 4's

I hope this is enough to show that I am counting $S$ as you intended.

Simulation for ten fair six-sided dice. With a million experiments of ten rolls each, the simulated probabilities should be accurate to about three decimal places. In the run shown, possible values of $S$ are $2, 3, \dots, 10.$ Three is by far the most likely value of $S.$ Ten (getting the same face on all ten rolls) is extremely unlikely [$(1/6)^9$], and did not happen to occur during this simulation run.

m = 10^6;  s = numeric(m);  f = 6;  d = 10;  die = 1:f
for (i in 1:m) {
  srt.roll = sort(sample(die, d, repl=T))
  s[i] = max(rle(srt.roll)$lengths)  }
table(s)/m
#  s
##        2        3        4        5        6        7        8        9 
## 0.067348 0.529659 0.310519 0.077897 0.012943 0.001536 0.000097 0.000001 

enter image description here

Note: If you are familiar with R, you can try out additional runs with different parameter values, to compare with any analytical results you get. With m = 10^6 the program takes a couple of minutes to run, mainly because of the sorting; you might find m = 10^5 gives enough accuracy. If you're not familiar with R, you could easily get it for free from www.r-project.org (Windows, Mac, Linux supported), and just type the program into the Session window.

1
On

This is just an outline for an analytic solution. If we call $S$ as the random variable that count the maximum length of the different groups in a throw of dice then

$$\Pr[S=s]=\frac1{D^n}\sum_{\|\alpha\|_\infty =s}\binom{n}{\alpha}K_\alpha$$

where $n$ is the number of dice thrown, $D$ is the number of faces of each dice, and $\alpha\in\Bbb N_{\ge 0}^D$ and $\|\alpha\|_1=n$ (it is a multi-index).

The main problem seems to be the coefficients $K_\alpha$, that count the variations of each setup $\alpha$, this depends on $n$, $D$ and of course $\alpha$ itself, the big problem is count the multiplicities inside each $\alpha$.

All of this make me thing that the best approach is a simulation, with estimated error and variance, as the previous answer did.