Probability that 100 dice rolls sum to 400?

1k Views Asked by At

I would like to find the probability that the results of 100 dice rolls sum to 400. I think to do this exactly would take a lot of work unless there's some trick I am not privy to?

I believe the sum is distributed with a multinomial distribution with 100 trials. I need to figure out the number of ways 100 dice rolls can sum to 400, which seems to be a tall task. Any hints?

The other thought I have is typically for these kind of problems, I can use CLT and approximate the sum using a normal distribution. But typically in those kind of problems, I am trying to find the probability that a variable falls within a range of values. Here I am trying to find that the sum is exactly 400, and it's not clear to me how I can apply CLT and the normal distribution. One thought I did have is I could use a normal distribution with mean 350 and variance 291, and then integrate the area around the curve right around 400, e.g., the bounds / limits of integration would be $400 \pm \epsilon$, but it's not clear to me what $\epsilon$ should be.

5

There are 5 best solutions below

3
On

Your approach using the CLT seems the way to go. We have

$$Y = \sum_{i=1}^{100} X_i$$ where $\mu_X = 7/2$ and $\sigma_X^2=35/12$. Hence $\mu_Y = 350$ and $\sigma_Y^2=3500/12=291.66$, $\sigma_Y=17.08$

Then, assuming the distribution $Y$ approaches a Gaussian with that mean and variance $g(x)$ we can approximate

$$P(Y=400) \approx \int_{400-1/2}^{400+1/2} g(x) dx \approx g(400)$$

This should give an very good approximation.

Search for "continuity correction", for example here or here.

You can even refine it (with quite more work) using Edgeworth expansion. To see how that works (and how the above integral can be justified) you can see this answer.

Let's compute the exact value numerically with Octave/Matlab, and compare with the CLT approximation:

>> p0 = [0 1 1 1 1 1 1]/6;
>> p = [1 1 1 1 1 1]/6;
>> for n = 2:100
>>    p = conv(p0,p);
>> endfor
>> p(400)
ans =  0.00031721
>> s = 3500/12
s =  291.67
>> (1/sqrt(2*pi()*s))*exp(-(400-350)^2/(2*s))
ans =  0.00032152
 

The approximation $3.2152 \cdot 10^{-4}$ differs from the exact value $3.1721 \cdot 10^{-4}$ in less than $1.5 \%$

0
On

You are looking for $$ \eqalign{ & N_b (300,5,100) = \cr & = {\rm No}{\rm .}\,{\rm of}\,{\rm solutions}\,{\rm to}\;\left\{ \matrix{ {\rm 1} \le {\rm integer}\;x_{\,j} \le 6 \hfill \cr x_{\,1} + x_{\,2} + \; \cdots \; + x_{\,100} = 400 \hfill \cr} \right. = \cr & = {\rm No}{\rm .}\,{\rm of}\,{\rm solutions}\,{\rm to}\;\left\{ \matrix{ {\rm 0} \le {\rm integer}\;y_{\,j} \le 5 \hfill \cr y_{\,1} + y_{\,2} + \; \cdots \; + y_{\,100} = 300 \hfill \cr} \right. \cr} $$ whose exact solution reads as $$ N_b (s,r,m)\quad \left| {\;0 \leqslant \text{integers }s,m,r} \right.\quad = \sum\limits_{\left( {0\, \leqslant } \right)\,\,k\,\,\left( { \leqslant \,\frac{s}{r+1}\, \leqslant \,m} \right)} {\left( { - 1} \right)^k \binom{m}{k} \binom { s + m - 1 - k\left( {r + 1} \right) } { s - k\left( {r + 1} \right)}\ } $$ as explained in this related post and various others.

In terms of probability that becomes $$ p_{\,b} (s;r,m) = {{N_{\,b} (s,r,m)} \over {\left( {r + 1} \right)^{\,m} }} $$

Although heavy, the two formulas above can be computed by a good CAS, giving $$ p_{\,b} (300;5,100) = 0.0003172 \ldots $$

If you need instead an asymptotic formula, you can resort on CLT.

To apply that properly, we shall first convert the sum of $m$ discrete uniform variables over $[0,r]$ into the approximating sum of $m$ continuous uniform variables on the support $[-1/2, \, r+1/2]$.
The relevant distribution is the Irwin-Hall distribution.

From this you can pass to $$ \eqalign{ & p_{\,b} (s;r,m) = {{N_{\,b} (s,r,m)} \over {\left( {r + 1} \right)^{\,m} }} \approx {1 \over {\sqrt {2\pi m\sigma ^{\,2} } }} e^{\, - \,{{\left( {s - m\mu } \right)^{\,2} } \over {2m\sigma ^{\,2} }}} \cr & = {{\sqrt {6/\pi } } \over {\sqrt {m\left( {\left( {r + 1} \right)^{\,2} } \right)} }} e^{\, - \,6{{\left( {s - mr/2} \right)^{\,2} } \over {m\left( {\left( {r + 1} \right)^{\,2} } \right)}}} \cr} $$ but you cannot pretend it to be very precise.

0
On

For your approach you should take $\epsilon=0.5$. You are wanting to integrate the error function between $399.5$ and $400.5$ because that is the continuous approximation to your distribution. Another way to use this approach is to use a z-score table, computing the number of standard deviations $399.5$ and $400.5$ equate to.

To get an exact count would not be so hard if spreadsheets could handle $97$ digit numbers. Make columns from $1$ to $100$ for the number of dice and rows from $1$ to $400$ for the sum. Each cell is the sum of the four cells in the prior column from $1$ to $4$ rows above corresponding to rolling $1$ to $4$ on the last die. Write that in the second column and copy right and down and you are done except for overflow. Python gives arbitrary precision integers as soon as you need them, so will do this easily.

2
On

I would definitely go here for generating function and then for complex derivative. Now the number of possible outcomes is listed in the coefficients of

$$ G(x)=(x+x^2+x^3+x^4+x^5+x^6)^ {100} $$

You need the coefficient at $x^{400}$. If you find $400$-th derivative you just need to multiply it by $400!$.

But this is all what Cauchy integral formula is about

$$G^{(n)}(0) = \frac{n!}{2\pi i} \oint_\gamma \frac{G(z)}{z^{n+1}}\, dz$$

which comes to calculating

$$N(100,400)=\frac{1}{2\pi} \int_0^{2\pi} \frac{(e^{ix}+e^{2ix}+e^{3ix}+e^{4ix}+e^{5ix}+e^{6ix})^{100}}{e^{400ix}}\, dx$$

which can be shortened to

$$N(100,400)=\frac{1}{2\pi} \int_0^{2\pi} \frac{(e^{6ix}-1)^{100}}{(e^{ix}-1)^{100} e^{400ix}}\, dx$$

Now this integral can be solved in many ways, starting from using different approximations to going for solving this integral directly.

Dividing the result by $6^{100}$ you have got close to $0.000317214$

0
On

The exact answer to your question is:

  1. Yes you can use Normal distribution
  2. You need to adjust to so called Continuity Correction

All together:

$$\operatorname{Mean}(m) = \frac{7}{2}m$$ $$\operatorname{Variance}(m) = \frac{35}{12}m$$

You use for $P(X=n)$

$$P(n – 0.5 < X < n + 0.5)$$

In your case:

$$m=100, n=400$$