Probability of average of N samples of an unfair dice bigger than 5.

79 Views Asked by At

Imagine we have a dice:

Side probability
1 0.1
2 0.1
3 0.1
4 0.1
5 0.2
6 0.4

We will throw the dice N times (ie. 7 times) And we want to calculate the probability that the average of the 7 faces of the dice is bigger than 5.

I tried mapping all outcomes, but it gets too nasty too fast because the number of possible outcomes grows exponentially $$6^N$$

So I decided to try a Binomial approach, in which I iterate first for the 1s and calculate the probabilities of getting either 0,1,2,..., or 7 ones. If I get a 7 I am done because there's no other 'free slot'. But if I get 6 ones I still have to fill it, so I have to calculate now for the probability of getting either one or zero 2s, and so on until all the 'slots' are full.

Is there a better approach to this? I have the feeling that this should be a trivial problem but I just don't know where to look at!!!

Should I just better move to MonteCarlo and approximate it??

Thank you all!!

2

There are 2 best solutions below

1
On BEST ANSWER

I wrote a Mathematica program which answers your question exactly, and efficiently. I used probability generating functions to do this. Try it online!

First, let me explain the method.

  • The probability that the average of $n$ dice is more than $5$, is equal to the probability that the sum of $n$ dice is more than $5n$.

  • A probability generating function is a polynomial, $P(x)$, which encodes the distribution of a discrete random variable, $T$. For any integer $k$, the probability that $T$ is equal to $k$ is the coefficient $x^k$ in $P(x)$. For example, the pgf of a single die roll is $$ P_\text{die}(x)= 0.1x^1 + 0.1x^2 + 0.1x^3 + 0.1x^4 + 0.2x^5 + 0.4x^6. $$

  • A useful fact about pgf's is that the pgf of the sum of two independent random variable is the product of the individual pgf's. Therefore, the pgf for the total of $n$ rolls of your die is simply $P_\text{die}(x)^n$.

  • Finally, we can use a common generating function trick to extract the probability of rolling at most a certain level. If $Q(x)$ is any polynomial, say $Q(x)=a_0+a_1x+\dots+a_nx^n$, then $Q(x)/(1-x)$ will be a power series, where the coefficient of $x^k$ in $Q(x)/(1-x)$ is the sum of the coefficients $a_0+a_1+\dots+a_k$. Letting $Q(x)=P_\text{die}(x)^n$, then the coefficient of $x^k$ in $P_\text{die}(x)^n/(1-x)$ is $P(T\le k)$, where $T$ is the total of the $n$ dice.

Here is the Mathematica program I wrote, which combines all of the above logic.

NumDice = 10; (* Replace "10" with whatever number of dice you want. *)
AverageToBeat = 5;

TargetTotal = NumDice * AverageToBeat;
ProbGenFunDie[x_]   = 0.1x^1 + 0.1x^2 + 0.1x^3 + 0.1x^4 + 0.2x^5 + 0.4x^6;
ProbGenFunTotal[x_] = ProbGenFunDie[x] ^ NumDice;
ProbAtMost = SeriesCoefficient[ProbGenFunTotal[x] / (1-x), {x, 0, TargetTotal}];

Print["The probability that the average of ", NumDice , 
      " dice is more than ", AverageToBeat , 
      " is ", 1 - ProbAtMost, "."]
0
On

If the dice was fair, we could use a simple counting strategy to count the number of outcomes that add up to a given sum (because we can very easily convert an average target to a sum target), but alas that option is not as helpful here.

So, let's look at the question and the facts. We're asking to know the probability of a dice roll where the average die face is greater than 5. We're using a 6-sided die labelled 1-5. So, five of the six sides can only subtract from our target. Only sixes can increase the die face higher than 5.

Further, we can deduce that the other faces must be high as well - if any dice is lower than five, we need a number of 6s to compensate equal to the difference. That further limits the number of outcomes that will contribute.

Indeed, we can take our die faces and relabel them based on their relation to our target average:

6 (a) 5 (b) 4 (c) 3 (d) 2 (e) 1 (f)
1 0 -1 -2 -3 -4

And we then, fundamentally, want any end result higher than 1, and the upper bound on our result is $n$. Thus, $a > c+2d+3e+4f$.

A straightforward, if computationally exhausting, option would be to treat this as a multinomial distribution. We'd still need a lot of iterators to sum all the possibilities, but let's try and figure out a strategy to do that. The mass function for any given combination of probabilities in our case is:

$$P(X=x)=\frac{n!}{a!b!c!d!e!f!}(0.4)^a(0.2)^b(0.1)^c(0.1)^d(0.1)^e(0.1)^f$$

We know that all possible results with only 6s, or 6s and 5s, will match our target, so let's start with our 6s. if our iterator $i$ starts at 0, we want our count of 6s to start at n, so let's set $a=n-i$ to start, and set $b=i$, so:

$$P(X=x)=\sum_{i=0}^{n-1}\frac{n!}{(n-i)!i!c!d!e!f!}(0.4)^{n-i}(0.2)^i(0.1)^c(0.1)^d(0.1)^e(0.1)^f$$

To reduce the number of unneeded iterations, we can take a greedy approach to allocating the rest of our iterators to result in the below:

$$P(X>x)=\sum_{i=0}^{n-1}\sum_{j=0}^{\left \lfloor \frac{n-1-i}{4} \right \rfloor}\sum_{k=0}^{\left \lfloor \frac{n-1-i-4j}{3} \right \rfloor}\sum_{l=0}^{\left \lfloor \frac{n-1-i-4j-3k}{2} \right \rfloor}\sum_{m=0}^{n-1-i-4j-3k-2l}\frac{n!}{(n-i)!(i-j-k-l-m)!j!k!l!m!}\cdot(0.4)^{n-i}(0.2)^{i-j-k-l-m}(0.1)^j(0.1)^k(0.1)^l(0.1)^m$$

For $N=7$, this will still likely end up with a lot of calculations needing to be done, but it should be significantly less than $6^N$.