How to determine the average number of dice showing the most common value?

291 Views Asked by At

Suppose the situation that tossing $n$ dice simultaneously. Each die has $k$ faces. How can I determine the average number of dice showing the most common value among them? For example, $n=10$ and $k=3$, the result of tossing is 1-3-2-1-1-1-2-1-2-3, respectively. The most common value is "1" with 5 dice. Trying again, the result is 3-2-3-2-1-3-3-3-3-3. The most common value is "3" with 7 dice. Try in this way 1,000 times and average the number of dice showing the most common value. That is $\frac{{5 + 7 + \cdots }}{{1,000}}$. Do there already exist any function (with parameter $n$, $k$) to determine the average?

Note that the number of dice showing "i" is not independent of the number of dice showing "j" with $i\ne j$ that is because the total number of dice is fixed. I cannot solve this problem because the independent assumption cannot be used. Thank you.

2

There are 2 best solutions below

1
On BEST ANSWER

This problem was interesting so I tried my hand at it and to my surprise I succeeded! Well, if you count creating a recursive function (in the form of a Python program) which quickly calculates the desired result - 2 seconds on my machine for n=50,k=20. This could easily be optimised further. Maybe it will help someone come up with a "purer" solution.

Anyway, here is my reasoning:

Let us take an outcome of the toss, and relabel the dice so that the most frequent value is $1$, the next most frequent is $2$ and so on up to $k$. For instance, the toss 1-3-2-3-3 will become either 3-1-2-1-1 or 2-1-3-1-1.

As you can see, if there are "ties" then there are multiple ways to perform this relabeling. For instance, if the outcome is 5-5-1-1-1-3-3-3-2-2-2-4-4, then 1, 2 and 3 occur 3 times and 4 and 5 occur twice, so there are $3!\times 2!$ ways of doing the relabeling. Given a relabelling $r$ we will denote the number of equivalent relabelings by $c(r)$, which is given by the product of the factorials of the number of times each frequency of values is repeated.

Also, given a relabelling with $l\leq k$ labels, there are $\frac{k!}{(k-l)!}$ outcomes that can be relabeled this way. (e.g. if only 2 out 4 labels are used, there are 4 options for the first label and 2 for the 2nd.)

Obviously the number of ones in a relabeling is equal to the number of times the most common value appears in an outcome which can be relabeled in this way. Any counting we do on relabelings will over-count real outcomes by a factor of $c(r)$ and under-count them by a factor of $\frac{k!}{(k-l)!}$.

So $p_m$, the probability that the most frequent value occurs $m$ times, is given by

$$ p_m = \sum_{r \, \in \, R_m} p(r)\frac{1}{c(r)} \frac{k!}{(k-l(r))!}, $$

where $R_m$ is the set of relabelings containing exactly $m$ ones, and $p(r)$ is the probability of getting the outcome $r$ in a toss. $p(r)$ is in fact $\frac{1}{k^n}$.

Note that the quantity you are after is the expected value of the largest frequency of dice values, and by definition of expected values, this is simply:

$$ \sum_{1 \leq m \leq n} p_m. $$

So a quick simplification:

$$ p_m = \frac{1}{k^n}\sum_{r \, \in \, R_m} \frac{1}{c(r)}\frac{k!}{(k-l(r))!}. $$

Now, let us sort the relabelings, e.g. 3-2-3-1-1-1-1-2 becomes 1-1-1-1-2-2-3-3. Note that this does not affect $c(r)$ or $l(r)$.

So how many relabelings correspond to a sorted relabeling? In the above example it's $\frac{8!}{4! \times 2! \times 2!}$. In general let us call this quantity $s(r)$ which is given by the product of the factorials of the number of times each label is repeated.

Finally we have

$$ p_m = \frac{1}{k^n}\sum_{r \, \in \, S_m} \frac{s(r)}{c(r)}\frac{k!}{(k-l(r))!}, $$

where $S_m$ is the set of sorted relabelings containing exactly $m$ ones.

For smallish $n$, this is possible to work out by hand, because it's really a matter of saying "how many 1s? how many 2s?" etc. However, let's get the computer to do it.

Here is a Python script I have tested against a numerical simulation and the results appear correct.

#given n dice and k labels, find the "average" count of
#the most commonly occuring value
def avg(n, k):
    return sum(prob(n, k, i) * i for i in range(1, n+1))

#given n dice and k labels, find the probability that
#the most common value occurs c times
def prob(n, k, c):
    return 1./(k**n) * fact(n) * 1./fact(c) * calc(n - c, k - 1, c, 1) * k
    #                            1./fact(c) contributes to s(r)
    #                                                k contributes to c(r)

#given n remaining dice and l remaining labels, all of which must occur
#with frequency <= prev_freq, and an existing runlength
#this function tries to give the next label to every allowable number of dice
#and sums the results
def calc(n, l, prev_freq, runlength):
    #all dice accounted for
    if n == 0:
        return 1.0
    #not enough labels to give remaining dice
    if n > l * prev_freq:
        return 0.0

    term_sum = 0
    if prev_freq <= n:
        freq = prev_freq
        #same freq as previous label
        term_sum = (1./fact(freq) * calc(n - freq, l - 1, freq, runlength + 1)
                                                             / (runlength + 1))
        #           1./fact(freq) contributes to s(r)
        #           1./(runlength + 1) contributes to c(r)

    for freq in range(1, min(prev_freq, n + 1)):
        #smaller freq than previous label
        term_sum += 1./fact(freq) * calc(n - freq, l - 1, freq, 1)

    return term_sum * l
    #                 l contributes to k! / (k - l(r))!

#factorial
def fact(n):
    prod = 1
    for i in range(2, n+1):
        prod *= i
0
On

This is not yet an answer but merely a reformulation. There are $n$ dices, each having $k$ faces. The distribution of the counts $C = (C_k, \ldots C_k)$ of the faces is $\mathrm{Multinomial}(N, [1/K,\ldots, 1/K])$. The most common value is given by $M = \max_i C_i$ which is a random variable. What you seek seems to be $\mathbb{E}[M]$. Since I do not yet know the answer, I will just throw out some possibly related links