Probability n Dice Results are subset of m Dice Results

229 Views Asked by At

Let D(n,m) be the probability that a multiset of n dice results will be a submultiset of m dice results.

Multiset indicates repeats are counted: If n=3, {3,3,4} is a submultiset of {3,3,4,5,6} but not of {3,4,5,6}.

The following exact formulas for n=1,2,3 agree with simulation but I would like some way to calculate D(n,m) for n>3 more accurately than simulation.

Ideas? Possibly using generating functions- I'm not sure?

D(1,m)= 1-(5/6)^m

D(2,m)= (1/6)(1-(5/6)^m-m(5/6)^(m-1)(1/6)) +(5/6)(1-2*(5/6)^m+(4/6)^m)

D(3,m)= (1/6)^2*(1-(5/6)^m-m*(5/6)^(m-1)(1/6)-COMBIN(m,2)(5/6)^(m-2)(1/6)^2) +(15/36)(1-2*(5/6)^m-m*(5/6)^(m-1)(1/6)+(5/6)^m((4/5)^m+m*(4/5)^(m-1)(1/5))) +(5/6)(4/6)(1-3(5/6)^m+3*(4/6)^m-(3/6)^m)

1

There are 1 best solutions below

5
On

Too long for a comment:

Perhaps you could extend your question to $D(n,m,s)$ being the probability that the multiset of results from rolling $n$ $s$-sided dice is a submultiset of the results from rolling $m$ other $s$-sided dice.

If $\displaystyle Q(a,d,s)={d \choose a}\frac{(s-1)^{d-a}}{s^d}$ is the probability that $d$ $s$-sided dice show a pre-specified value $a$ times, then I think you have $$\displaystyle D(n,m,s) = \sum_{a=0}^n \sum_{b=a}^m Q(a,n,s) \,Q(b,m,s) \,D(n-a,m-b,s-1)$$ with $D(n,m,s)=0$ when $n \gt m$ and $D(n,m,1)=1$ when $0 \le n \le m$.

This recursion seems to give the following probabilities for $6$-sided dice:

  0         1          2          3          4           5           6
0 1 1.0000000 1.00000000 1.00000000 1.00000000 1.000000000 1.000000000
1 0 0.1666667 0.30555556 0.42129630 0.51774691 0.598122428 0.665102023
2 0 0.0000000 0.05092593 0.12808642 0.21617798 0.305984225 0.392200360
3 0 0.0000000 0.00000000 0.02134774 0.06539352 0.126639660 0.198346527
4 0 0.0000000 0.00000000 0.00000000 0.01089892 0.038119427 0.081405115
5 0 0.0000000 0.00000000 0.00000000 0.00000000 0.006353238 0.024388941
6 0 0.0000000 0.00000000 0.00000000 0.00000000 0.000000000 0.004064824

and the row starting "2 0 0.0000000 0.05092593 0.12808642" has the same values as given by your $D(2,m)$ formula, which is encouraging.

For what it is worth, here is some R code for the recurrence:

Q <- function(a,d,s){ choose(d,a)*(s-1)^(d-a)/s^d }  

maxn     <- 10
maxm     <- 19 # should be at least maxn
maxsides <-  6
D <- matrix(rep(0,(maxm+1)*(maxn+1)), ncol=maxm+1)
rownames(D) <- 0:maxn
colnames(D) <- 0:maxm
for (n in 0:maxn){ for (m in n:maxm){ D[n+1,m+1] <- 1 }}
print(paste("Dice with", 1,"side, probabilities:"))
print(D)
for (sides in 2:maxsides){
    oldD <- D
    D <- matrix(rep(0,(maxm+1)*(maxn+1)), ncol=maxm+1)
    rownames(D) <- 0:maxn
    colnames(D) <- 0:maxm
    for (n in 0:maxn){ for (m in n:maxm){
        summing <- 0 
        for (a in 0:n){ summing <- summing + 
            Q(a,n,sides) * sum(Q(a:m,m,sides)* oldD[n−a+1,m−(a:m)+1])}
        D[n+1,m+1] <- summing }}
    print(paste("Dice with", sides,"sides, probabilities:"))
    print(D)
    }