I'm trying to figure out the probability of each sum possibility of a dice roll.
Given n s-sided dice, the probability of summing to p is shown on this stack overflow question. Fairly simple to compute.
However, my problem has a twist. If any of the dice roll a 1, the "sum" is 1.
As an example, for rolling 2 3-sided dice (1, 2, 3), the probability outcomes are as follows:
"Sum" | P (x/9)
------|--------
1 | 5
2 | 0
3 | 0
4 | 1
5 | 2
6 | 1
I also tried this with 2 4-sided dice, but hand writing out 3 4-sided dice or 2 5-sided dice is going to get a little bit dicey. I seem to be noticing a little bit of a pattern, number 1 is pretty trivial to calculate for two dice (P that either dice is 1 - P that both are 1). I don't know how to do this for 3 though. The rest seem to start at 4 and go up from 1 then back down.
Can anybody weigh in and provide a closed-form mathematical expression for each P(n), and explain how you got this?
Thanks!
Suppose we have $k$ dice, each with $n$-sides, and let $S$ be the random variable that counts your modified sum. The total number of dice roll configurations is $n^k$, so to calculate probabilities $\Bbb P(S=s)$ it's enough to calculate the count $C(s)$ of dice roll configurations for which $S=s$. Of course, we'll then have $\Bbb P(S=s) = C(s)/n^k$.
The minimum value greater than $1$ that $S$ can take is $2k$, which happens when all $k$ dice roll $2$. The maximum value of $S$ is of course $nk$. For each $s$ with $2k\leqslant s \leqslant nk$, a combination of dice rolls that add up to $s$ corresponds uniquely to a solution of
$$x_1+x_2+\dots + x_k = s,\tag{1}$$
where $2\leqslant x_i\leqslant n$ is the value of the $i$-th die. We can rewrite this as
$$x_1'+x_2'+\dots + x_k' = s-2k,\tag{1$'$}$$
where $x_i' = x_i - 2$, and hence $0\leqslant x_i' \leqslant n-2$. Using this representation, one can derive a closed form expression with nested sums, but we can do better. We can count solutions of $(1')$, and hence of $(1)$, via inclusion-exclusion in a manner similar to this answer.
Consider equation $(1')$, except we require only that $x_i\geqslant 0$. The number of solutions is given by stars and bars and equals $\binom{s-k-1}{k-1}$. Now, let $A_i$ be the set of solutions to $(1')$ in which $x_i'>n-2$. Then, by inclusion-exclusion:
$$\bigcup_{i=1}^k |A_i| = \sum_{\emptyset \neq J \subset\{1,\dots,k\}} (-1)^{|J|+1}\left|\bigcap_{j\in J}A_j\right|.$$
By symmetry, $\left|\bigcap_{j\in J}A_j\right|$ depends only on $|J|$, and hence we may simplify the calculation above to
$$\bigcup_{i=1}^k |A_i| = \sum_{j=1}^k (-1)^{j+1}\binom kj \alpha_j,$$
where $\alpha_j$ is the size of any intersection $\left|\bigcap_{j\in J}A_j\right|$ with $|J| = j$. In the spirit of the linked answer, we substitute $x_i' \mapsto y_i + n - 1$ for each $i=1,\dots,j$ and consider hence
$$y_1+y_2+\dots+y_j + x_{j+1}'+\dots+x_k' = s - 2k - (n-1)j.$$
By stars and bars, the number of solutions to this is
$$\alpha_j = \binom{s-k-(n-1)j-1}{k-1}.$$
Of course, for many values of $n$ and $j$, this will simply be $0$ as the numerator will be $<k-1$. It follows that
$$\begin{align} C(s) &= \binom{s-k-1}{k-1} - \sum_{j=1}^k (-1)^{j+1}\binom kj \binom{s-k-(n-1)j-1}{k-1} \\&= \sum_{j=0}^k (-1)^{j}\binom kj \binom{s-k-(n-1)j-1}{k-1} \end{align}.$$
The only thing that's left is $C(1)$, but that's the easy part. The total number of dice roll configurations is $n^k$, and and the number of dice roll configurations that have no $1$s is $(n-1)^k$. It follows that
$$C(1) = n^k - {(n-1)}^k$$
Here's some Python code to validate the counts.
The sanity check against $k=2$ and $n=3$ works out and outputs:
Other tests of mine have always had the counts add up to $n^k$, so I'm fairly confident at this point.
They also point in the direction that the counts for $s>1$ have an increasing and then decreasing pattern of repetitions (like, say, the binomial coefficients). Indeed, to each solution of $(1)$ we can associate a unique dual solution of
$$X_1 + X_2 +\dots + X_k = (n+2)k - s$$
via the transformation $X_i = (n+2) - x_i$, so the counts must be the same. This means we can restrict the calculations to the range
$$2k \leqslant s \leqslant \left\lfloor\frac{(n+2)k}2\right\rfloor,$$
and counts for higher values of $s$ can be obtained from these via the dual association.