Deriving formula for rolling a run of q consecutive identical rolls on a k-sided die within n rolls

133 Views Asked by At

I am trying to write a general formula for the probability of rolling/flipping a $k$-sided object $n$ times and having at least one occurrence of at least $q$ consecutive identical rolls/flips.

This began as an attempt to work out the probability of flipping a coin ten times and having at least one run of three or more identical flips.

In this example case, it is easy to determine the total number of possible outcomes with 10 coin flips -- $2^{10}$. What is not so easy is finding a simple way to count the number of favorable outcomes.

For instance, if you start by counting the number of flips with the first three being heads (HHH???????), there are $2^7$ possibilities. There are also $2^7$ possibilities for TTT???????.

The next few are not too difficult either. For ?HHH?????? there are also $2^7$ possibilities, minus the $2^6$ possibilities you've already counted in the format HHHH??????. The same goes for tails.

The further you go down the line, the trickier it is to subtract the combinations you've already counted, especially when you get to ???HHH????, because then you've already counted possibilities of the type HHHHHH????, ?HHHHH????, ??HHHH????, but also possibilities of type TTTHHH????. The further you go down the line from there, it gets even more complicated keeping track of what you've already counted.

Is there a simpler way to do this than trial and error, and can it be generalized to a formula (as my question asked) for any k-sided object, any number of flips/rolls, and any length of run?

I would like to see how this formula is arrived at, not just what the formula is. NOTE: I am very adept at understanding mathematical concepts, but vocabulary-wise I am not beyond basic self-taught calculus and high school advanced math. Thus I can track with you on a complex explanation as long as you explain (or perhaps link to explanations for) any foreign terms along the way.

3

There are 3 best solutions below

5
On BEST ANSWER

Let me just give you some pointers to get started. I haven't worked this out, so I don't know if you'll find a "nice" formula, but I can tell you that there's a better way than what you've been doing.

Precisely because of the difficulties with inclusion and exclusion that you've encountered, it's easier to count the sequences than do not have a run of $q$ identical outcomes. Call a sequence "good" if no run of $q$ identical outcomes occurs in it. Let $S_j^{(n)}$ be the number of good sequences of length $n$ than end in a maximal run of $j$ identical outcomes. For example, $S_3^{(10)}$ is the number of sequences of length $10$ that end in a run of $3$ identical outcomes, but not $4$ identical outcomes. Define $$S^{(n)}=\sum_{j=1}^{q-1}S_j^{(n)}$$ so that it is $S^{(n)}$ that we want to compute.

We have $$ \begin{align} S_1^{(1)}&=k\\ S_j^{(1)}&=0,\ j>1\\ S_j^{(n)}&=S_{j-1}^{(n-1)}, 1<j<q\\ S_1^{(n)}&=(k-1)S^{(n-1)} \end{align} $$

The first two equations are clear.

For the third equation, note that a good sequence of length $n$ that ends in a run of $j>1$ can only arise from a good sequence of length $n-1$ that ends in a run of $j-1;$ of course, the next outcome must be the same a the one in the run, so it can only arise in one way.

For the fourth equation, a good sequence of length $n$ ending in a run of exactly $1$ arises from a good any sequence of length $n-1$ followed by any of the $k-1$ outcomes that is different from the last outcome of that sequence.

For given values of $n,k,q$ this allows you to work out the value of $S^{(n)}$ quickly, especially if you use a computer. Then you just need to divide by $k^n,$ and subtract from $1.$

I suggest you try working out examples with $q=2$ and small values of $k$ to begin with, and see if you find any patterns.

Good luck.

EDIT

Corrected a typo. There are $k^n$ possible sequences of length $n$ of course, not $q^n$.

8
On

You can get a formula as well. Let's compute the number of sequences with no run of $q$ symbols. Such a sequence will break into blocks of the same symbol of length between $1$ and $q-1$. There are $k$ possibilities for the first block, and $k-1$ for each subsequent block.

Letting $a_n$ be the number of sequences with no runs of $q$, a little generating function magic shows that $$ \sum_{n=0}^\infty a_n x^n=\frac{k(x+x^2+\dots+x^{q-1})}{1-(k-1)(x+x^2+\dots+x^{q-1})}=k\cdot \frac{x-x^q}{1-kx+(k-1)x^q} $$ To solve for $a_n$, let's first ignore the numerator. The $x^n$ coefficient of this can be shown to be $$ \frac1{1-x(k-(k-1)x^{q-1})}=\sum_{r\ge 0}\binom{n-r(q-1)}{r}(1-k)^rk^{n-rq} $$ Here, we use the convention that $\binom{m}\ell=0$ when $m<0$, so the only finite summands are when $r\le \frac{n}{q-1}$. Putting this all together, the final answer is $$ a_n=k\sum_{r\ge 0}\binom{n-1-r(q-1)}{r}(1-k)^rk^{n-1-rq}-\binom{n-q-r(q-1)}{r}(1-k)^rk^{n-(r+1)q}\\ =\boxed{\sum_{r\ge 0}\Bigg[\binom{n-1-r(q-1)}{r}k^{q-1}-\binom{n-q-r(q-1)}{r}\Bigg](1-k)^rk^{n-q(r+1)+1}} $$

0
On

Starting with the complementary probability of not seeing a run of length $q$ we have the generating function

$$-\frac{k}{k-1} + \frac{k}{k-1} \sum_{p\ge 0} (k-1)^p (z+z^2+\cdots+z^{q-1})^p$$

We will be extracting coefficients $[z^n]$ where $n\ge 1$ so the adjustment term in front does not contribute. Continuing we find

$$\frac{k}{k-1} \sum_{p\ge 0} (k-1)^p z^p \left(\frac{1-z^{q-1}}{1-z}\right)^p \\ = \frac{k}{k-1} \frac{1}{1-(k-1)z(1-z^{q-1})/(1-z)} \\ = \frac{k}{k-1} \frac{1-z}{1-z-(k-1)z(1-z^{q-1})} \\ = \frac{k}{k-1} \frac{1-z}{1-kz + (k-1)z^q}.$$

Doing the coefficient extraction from this we get

$$[z^n] \frac{1}{1-kz + (k-1)z^q} = [z^n] \frac{1}{1 - z(k-(k-1)z^{q-1})} \\ = [z^n] \sum_{p\ge 0} z^p (k-(k-1)z^{q-1})^p = \sum_{p=0}^n [z^{n-p}] (k-(k-1)z^{q-1})^p \\ = \sum_{p=0}^n [z^p] (k-(k-1)z^{q-1})^{n-p} = \sum_{p=0}^{\lfloor n/(q-1)\rfloor} [z^{p(q-1)}] (k-(k-1)z^{q-1})^{n-p(q-1)} \\ = \sum_{p=0}^{\lfloor n/(q-1)\rfloor} [z^{p}] (k-(k-1)z)^{n-p(q-1)} \\ = \sum_{p=0}^{\lfloor n/(q-1)\rfloor} {n-p(q-1)\choose p} (-1)^p (k-1)^p k^{n-pq}.$$

We thus obtain for the probability the closed form

$$\bbox[5px,border:2px solid #00A000]{ \begin{gather} P_{k,n,q} = \frac{Q_{k,n,q}-Q_{k,n-1,q}}{k^{n-1}\times (k-1)} \\ \quad\text{where}\quad Q_{k,n,q} = \sum_{p=0}^{\lfloor n/(q-1)\rfloor} {n-p(q-1)\choose p} (-1)^p (k-1)^p k^{n-pq}. \end{gather}}$$

This formula permits the computation of the probability using basic arithmetic and binomial coefficients where enumeration will not suffice. For example, rolling a die with $16$ faces $10$ times with no run at least four we obtain

$$P_{16,10,4} = {\frac {4288021875}{4294967296}} \approx 0.9983828932.$$

Since we have computed the complementary probability we use $1-P_{k,n,q}.$