What is the probability of throwing at least k ones when throwing $n$ six-sided dice and $m$ eight-sided dice?

167 Views Asked by At

Problem history:

I am working on calculating the probability of selecting at least $k$ $x$-cost cards in the creation of a deck in Hearthstone, a trading card game. In a certain game mode called Arena you have to choose $30$ times between three randomly selected cards to construct a deck. Picks $1$, $10$, $20$, and $30$ are special and the other picks are normal. Every card in Hearthstone has a cost; if I'm for example interested in cards with a cost of $2$, then I can calculate the probability that I will get the card offered in a single pick by calculating the ratio of $2$-cost cards versus all the offered cards. Suppose the probability for normal picks is $p_n$ and the probability for special picks is $p_s$.

Problem translation:

We can actually ask the above question at any point in the picking, so there may be less cards than $30$ remaining.

We have the following random variables as an example when we still have all $30$ picks remaining:

$$X\sim\text{Binom}(26, p_n),$$ $$Y\sim\text{Binom}(4, p_s).$$

As another example we can take the case of having completed $10$ picks, so there are $20$ remaining, then:

$$X\sim\text{Binom}(18, p_n),$$ $$Y\sim\text{Binom}(2, p_s),$$

because picks $1$ and $10$ were special picks, there's $20$ picks remaining, and picks $20$ and $30$ are also special.

We want to know the probability of picking at least $k$ $x$-drops in $X$ and $Y$ combined.

New problem statement:

To decouple our original problem statement from Hearthstone, I thought a translation to a die experiment would be useful.

I think the new problem statement is the following:

What is the probability of throwing at least $k$ ones when throwing $n$ fair six-sided dice and $m$ fair eight-sided dice?

Of course I have thought about the solution, but it seems to be an ugly case distinction, and I feel that there should be a better solution.

2

There are 2 best solutions below

1
On

Not sure if there would be a neat closed form expression for this. However, we essentially want to sum up the probabilities of all events in which $k_6 + k_8 \geq k$, where we define $k_6$ as the number of ones rolled from the $n$ six-sided dice, and $k_8$ as the number of ones rolled from the $m$ eight-sided dice. This definition yields:

$$\sum_{k_6 + k_8 \geq k}{\left(\binom{n}{k_6}\cdot\left(\frac{1}{6}\right)^{k_6}\cdot\left(\frac{5}{6}\right)^{n - k_6}\cdot\binom{m}{k_8}\cdot\left(\frac{1}{8}\right)^{k_8}\cdot\left(\frac{7}{8}\right)^{m - k_8}\right)}.$$

Equivalently, we could write:

$$\sum_{k_6 = 0}^{n}{\left(\binom{n}{k_6}\cdot\left(\frac{1}{6}\right)^{k_6}\cdot\left(\frac{5}{6}\right)^{n - k_6}\cdot\sum_{k_8 = k - k_6}^{m}{\binom{m}{k_8}\cdot\left(\frac{1}{8}\right)^{k_8}\cdot\left(\frac{7}{8}\right)^{m - k_8}}\right)}.$$

Will continue to think about more concrete expressions for this. Hope this helps!

0
On

You’re basically trying to find the cumulative distribution function (CDF) of the sum of two binomial random variables with different probabilities. The CDF is usually defined as the probability $\Pr(X\le k)$ of getting a value of at most $k$. Once you have that, you can find the value you’re looking for by computing $1-\Pr(X\le k-1)$.

Let $X\sim\operatorname{Binom}(n,p)$ and $Y\sim\operatorname{Binom}(m,q)$. The probability $\Pr(X+Y=k)$ of rolling exactly $k$ ones involves a convolution of the two distributions, and the CDF is a partial sum of these terms. That is, $$\Pr(X+Y=k)=\sum_{j=0}^k\Pr(X=j)\Pr(Y=k-j)=\sum_{j=0}^k\binom{n}{j}p^j(1-p)^{n-j}\binom{m}{k-j}q^{k-j}(1-q)^{m-k+j}\tag{1}$$ and $$\Pr(X+Y\le k)=\sum_{j=0}^k\Pr(X+Y=j).\tag{2}$$ The CDF of a pure binomial r.v. has a “nice” expression as an integral (for which see below), and there might be a simpler form for the convolution in (1), so there’s at least a slim hope of finding a way to express this CDF that doesn’t involve all of those sums and products, but I think it unlikely.

There are other ways available to compute (2) besides direct calculation of the above expressions. You might, for instance, try generating functions. The probability generating function for a Bernoulli r.v. with probability of success $p$ is $(1-p)+pz$ and dividing an ordinary generating function $g(z)$ by $1-z$ produces one whose coefficients are the partial sums of the coefficients of $g$, so the o.g.f. for the CDF of $X+Y$ is $$g(z)={(1-p+pz)^n(1-q+qz)^m\over1-z}$$ thus $$\Pr(X+Y\le k)=[z^k]g(z)=\frac1{k!}{d^kg\over dz^k}(0),$$ i.e., the CDF is encoded in the coefficients of the Taylor series for $g$ about $z=0$. Actually computing these coefficients doesn’t look much easier than computing (2) to me, though.

Unsurprisingly, the PDF of a binomial r.v. obeys a recurrence that’s similar to the one for binomial coefficients: $\Pr(X_n=k)=p\Pr(X_{n-1}=k-1)+(1-p)\Pr(X_{n-1}=k)$. This suggests building the PDF for $n+m$ dice using a Pascal’s triangle construction, with the first $n$ rows using $p=1/6$ and the last $m$ using $p=1/8$. The CDF is the obtained by summing across the last row. This calculation takes $O((n+m)^2)$ operations and storage proportional to $n+m$. If you want the value for a specific $k$, you don’t need to build the entire table, so the computation becomes $O(k(m+n))$. This can easily be done in a spreadsheet in matter of a few minutes.

Plotting these CDFs for various values of $n$ and $m$ and comparing them to the pure binomial cases with $n=0$ or $m=0$ suggests approximating the actual CDF by interpolating between CDFs of the two pure binomial r.v.s: $$\Pr(X+Y\le k)\approx{n\over n+m}\Pr(X'\le k)+{m\over n+m}\Pr(Y'\le k),$$ where $X'\sim\operatorname{Binom}(n+m,p)$ and $Y'\sim\operatorname{Binom}(n+m,q)$. Comparing plots of this approximation to the actual CDF shows that it is quite good over most of the function’s support. This is a nice result, since many tools and software libraries have built-in support for binomial CDFs. Also, a binomial CDF can be represented in terms of the regularized incomplete beta function: $$\Pr(X\le k)=(n-k)\binom{n}{k}\int_0^{1-p}t^{n-k-1}(1-t)^k\,dt.$$ For the values of $n$ and $k$ that we’re considering, the integrand is a polynomial in $t$ with $k+1$ terms, so this is quite easy to evaluate. It might also be possible to come up with an exact integral for the combined CDF through some convolution of beta functions, but that’s more work than I want to do on this problem.

Examining plots of these CDFs reveals something else that can make approximation simpler. For $p=1/6$ and $q=1/8$, once $k$ gets to about $(n+m)/3$, $\Pr(X+Y\le k)$ is very close to $1$ and changes little as $k$ increases.