Dice probabilities and stars/bars

161 Views Asked by At

I'm trying to apply stars and bars to answer a dice question: if $n$ players throw $k$-sided dice, what is the probability that exactly $i$ players throw a unique number (where $0 \leq i \leq n$)?

For example, if $4$ players throw $4$-sided dice, and three throw a $3$, and one throws a $4$, then $1$ player threw a unique number. If two throw a $1$ and two throw a $3$, then $0$ players threw a unique number.

This is easy to enumerate and count for small $n$ and $k$: for $4$ players throwing $4$ sided dice, then we have $4^4$ (or $k^n$ total outcomes), with the following probabilities: $$ P_0 = \frac{40}{256}, \, P_1 = \frac{48}{256},\, P_2 = \frac{144}{256},\, P_3 = \frac{0}{256},\, P_4 = \frac{24}{256},$$ where $P_i$ means the probability of exactly $i$ players throwing a unique number.

I was thinking this could be calculated with a stars-and-bars formulation, where we partition $n$ players into $k$ equally likely bins (each bin is a side of the dice). So a partition $[0,0,0,4]$ means all $4$ players threw a $4$, and $[1,1,1,1]$ means each player threw a different number (one player threw a $1$, the next threw a $2$, etc).

There would be $n \choose k$ or ${4+4-1} \choose {4-1}$ = $7 \choose 3$ = $35$ different partitions of $4$ players into $4$ dice-side bins. Of those partitions, $10$ represent the $P_0$ case: of the form $[0,0,0,4]$ or $[0,2,0,2]$, $12$ represent $P_1$, and look like $[0,0,1,3]$, $12$ represent $P_2$ and look like variations of $[0,1,1,2]$, there are zero for $P_3$, and $1$ representing $P_4$ looking like $[1,1,1,1]$.

So I think the analogous question here is: of the $35$ partitions, how many contain exactly $i$ $1$'s? I don't see a way of generalizing this into a general expression of $n$ and $k$.

4

There are 4 best solutions below

1
On BEST ANSWER

This can be done using inclusion–exclusion.

Choose $i$ players who roll unique numbers, leaving $k-i$ numbers and $n-i$ players. Impose $n-i$ conditions that these remaining players shouldn’t roll unique numbers. There are $\binom{n-i}j$ ways to choose $j$ particular conditions to be violated, $j!\binom{k-i}j$ ways to choose unique numbers for them and $(k-i-j)^{n-i-j}$ ways to choose numbers for the remaining players (where $0^0=1$). Thus, by inclusion–exclusion the number of outcomes in which exactly the $i$ chosen players roll unique numbers is

$$ \sum_{j=0}^{n-i}(-1)^j\binom{n-i}jj!\binom{k-i}j(k-i-j)^{n-i-j}\;. $$

There are $\binom ni$ ways to choose the $i$ players who roll a unique number and $i!\binom ki$ ways to choose the numbers they roll; and the total number of outcomes is $k^n$. Thus, the probability that exactly $i$ players roll unique numbers is

\begin{eqnarray*} P_i &=& \binom nii!\binom kik^{-n}\sum_{j=0}^{n-i}(-1)^j\binom{n-i}jj!\binom{k-i}j(k-i-j)^{n-i-j} \\ &=& \frac{i!}{k^n}\sum_{j=0}^{n-i}(-1)^jj!\binom n{i,j}\binom k{i,j}(k-i-j)^{n-i-j}\;. \end{eqnarray*}

For your example with $n=k=4$, this yields

$$ P_i=\frac94\cdot\frac1{i!}\sum_{j=0}^{4-i}(-1)^j\frac{(4-i-j)^{4-i-j}}{j!(4-i-j)!^2} $$

and thus

\begin{eqnarray*} P_0 &=&\frac94\cdot\frac1{0!}\left(\frac{4^4}{0!\cdot4!^2}-\frac{3^3}{1!\cdot3!^2}+\frac{2^2}{2!\cdot2!^2}-\frac{1^1}{3!\cdot1!^2}+\frac{0^0}{4!\cdot0!^2}\right) \\ &=& \frac94\left(\frac{256}{576}-\frac{27}{36}+\frac48-\frac16+\frac1{24}\right) \\ &=& \frac5{32}\;, \\ P_1 &=&\frac94\cdot\frac1{1!}\left(\frac{3^3}{0!\cdot3!^2}-\frac{2^2}{1!\cdot2!^2}+\frac{1^1}{2!\cdot1!^2}-\frac{0^0}{3!\cdot0!^2}\right) \\ &=& \frac94\left(\frac{27}{36}-\frac44+\frac12-\frac16\right) \\ &=& \frac3{16}\;, \\ P_2 &=&\frac94\cdot\frac1{2!}\left(\frac{2^2}{0!\cdot2!^2}-\frac{1^1}{1!\cdot1!^2}+\frac{0^0}{2!\cdot0!^2}\right) \\ &=& \frac98\left(\frac44-\frac11+\frac12\right) \\ &=& \frac9{16}\;, \\ P_3 &=&\frac94\cdot\frac1{3!}\left(\frac{1^1}{0!\cdot1!^2}-\frac{0^0}{1!\cdot0!^2}\right) \\ &=& \frac9{24}\left(\frac11-\frac11\right) \\ &=& 0\;, \\ P_4 &=&\frac94\cdot\frac1{4!}\left(\frac{0^0}{0!\cdot0!^2}\right) \\ &=& \frac9{96}\left(\frac11\right) \\ &=& \frac3{32}\;, \end{eqnarray*}

in agreement with your results.

Here’s Java code that calculates the probabilities for general $n$ and $k$. The results for $n=10$ and $k=20$ are in agreement with the ones given in HighDiceRoller’s answer.

2
On

The different partitions of 4 players into 4 dice-side bins are not equally likely to have occurred. There is little to no point in having commented on the fact you could partition them in this way unless you were to go through the incredibly tedious process of looking at each of those partitions individually and looking at how many outcomes (where each die is individually colored or the like) there are in the far more useful equiprobable sample space which is used in the traditional approach that they correspond to.

Stars and bars will almost always be the wrong approach when talking about sample spaces for any probability question. Of those times it is the right approach, it will be made explicitly clear that things are not being randomly chosen in a usual way.

In your example for instance, it is clear that all events in the probability space will have probabilities of the form $\frac{x}{4^4}$ with $x$ an integer or in simplified form as $\frac{x}{2^k}$ with $x$ an odd integer or zero and $k$ a non-negative integer. By simple divisibility arguments one can see that these can not equal a probability of the form $\frac{x}{35}$ with the exception of the trivial probabilities of $0$ or $1$.

1
On

My Icepool Python package can do this:

from icepool import d
n = 10
k = 20
output(d(k).pool(n).keep_counts_eq(1).count())

You can run this in your browser here.

Example output:

Die with denominator 10240000000000

Outcome Quantity Probability
0 2903873000 0.028358%
1 25689128600 0.250870%
2 173508091200 1.694415%
3 364147099200 3.556124%
4 1377611020800 13.453233%
5 991598630400 9.683580%
6 3281886720000 32.049675%
7 609493248000 5.952082%
8 2742719616000 26.784371%
10 670442572800 6.547291%

What this means:

  • d(k).pool(n): A pool of $n$ dice, each with $k$ sides.
  • keep_counts_eq(1): Take the multiset produced by rolling the pool. Keep all elements that were rolled exactly once and drop the rest.
  • count(): Count the total number of remaining elements.

Under the hood this uses dynamic programming using the decomposition of a multinomial as a product of binomials. This achieves reasonable efficiency even if not necessarily the best for any specific problem. Perhaps someone else will come up with a more bespoke solution.

0
On

One problem with trying to analyse partitions or compositions is that they would not be equally likely. Another would be actually listing them at least when $n$ and $k$ are large.

In a linked question I commented in effect that an easier question is the expected number of people with a unique birthday, which is $n \left(1-\frac{1}{k}\right)^{n-1}$. Linearity of expectation makes this easy.

I also commented that brute force could involve processing something like $k^n$ possibilities but I suspected recursion could reduce this to something of the order of $(n+1)^2(k+1)$. Each additional person will either leave both the number of unique values and number of total values seen unchanged (by getting an existing non-unique value), or increase both by $1$ (by getting a previously unseen value), or reduce the number of unique values by $1$ and leave the total number seen unchanged (by getting a previously unique value). So

$$\begin{array}{}p(i, t, n, k)& = & &\tfrac{t-i}{k}p(i, t, n-1, k) \\& &+& \tfrac{k-(t-1)}{k}p(i-1, t-1, n-1, k)\\& &+ &\tfrac{i+1}{k}p(i+1, t, n-1, k)\end{array}$$

starting with $p(0, 0, 0, k) =1$.

The answer to your question is then $P_i=\sum\limits_t p(i, t, n, k)$ for your desired $n$ and $k$.

As an illustration using R:

uniquedistribution <- function(cases, possibilities){
  probmatrix <- matrix(0,nrow=possibilities + 1, ncol=possibilities + 1)
  probmatrix[1,1] <- 1 
  newunique <- 
    outer(0:possibilities, 0:possibilities, function(x,y){possibilities+1-y}) 
  newdupe <- 
    outer(0:possibilities, 0:possibilities, function(x,y){x+1})
  olddupe <- 
    outer(0:possibilities, 0:possibilities, function(x,y){y-x})
  for (n in 1:cases){
    probmatrix <- (
       probmatrix * olddupe + 
       rbind(rep(0, possibilities + 1), 
         cbind(rep(0, possibilities + 1), probmatrix[,-(possibilities + 1)]
           )[-(possibilities + 1), ]) * newunique +
       rbind(probmatrix[-1,], rep(0, possibilities + 1)) * newdupe
       ) / possibilities  
    }
  distrib <- rowSums(probmatrix)
  names(distrib) <- 0:possibilities 
  if (cases < possibilities){
    distrib <- distrib[1:(cases+1)]
    }
  return(distrib)
  }

and this confirms the distributions for your $n=4,k=4$ and HighDiceRoller's $n=10,k=20$ and the birthday problem's $n=23,k=365$ as well as my assertion about the expected number of unique values seen:

uniquedistribution(4, 4)
#       0       1       2       3       4 
# 0.15625 0.18750 0.56250 0.00000 0.09375 
c(sum(uniquedistribution(4,4) * (0:min(4,4))), 4*(1-1/4)^(4-1)) 
# 1.6875 1.6875
   
uniquedistribution(10, 20)
#            0            1            2            3            4 
# 0.0002835813 0.0025087040 0.0169441495 0.0355612402 0.1345323263 
#            5            6            7            8            9 
# 0.0968358038 0.3204967500 0.0595208250 0.2678437125 0.0000000000 
#           10 
# 0.0654729075 
c(sum(uniquedistribution(10, 20) * (0:min(20,10))), 10*(1-1/20)^(10-1)) 
# 6.302494 6.302494
 
uniquedistribution(23, 365)
#            0            1            2            3            4 
# 2.000781e-19 2.370763e-17 6.877819e-16 2.843732e-14 3.554791e-13 
#            5            6            7            8            9 
# 9.380826e-12 6.545402e-11 1.337417e-09 5.657931e-09 9.967415e-08 
#           10           11           12           13           14 
# 2.611084e-07 4.286356e-06 6.849860e-06 1.121294e-04 1.042552e-04 
#           15           16           17           18           19 
# 1.825184e-03 9.018574e-04 1.841557e-02 4.073906e-03 1.110355e-01 
#           20           21           22           23 
# 7.395218e-03 3.634222e-01 0.000000e+00 4.927028e-01   
c(sum(uniquedistribution(23, 365) * (0:min(365,23))), 23*(1-1/365)^(23-1)) 
# 21.65286 21.65286