Probability of subsequence 123456 in $n$ rolls, combinatorics approach?

257 Views Asked by At

I was trying to solve this problem:

We roll a 6-sided die n times. What is the probability that all faces have appeared in order, in some six consecutive rolls (i.e., what is the probability that the subsequence 123456 appears among the n rolls)?

My immediate reflex was to use Markov chains, but I soon backed down because of how tedious the calculations seemed. I know that it's supposed to be the correct way to tackle it but I've tried something different that ended up being wrong and I'd really appreciate it if you could help pointing out my mistakes.

So I decided to count the number of sequences containing the subsequence 123456. Indeed, any sequence of $n$ numbers is equiprobable of probability $\frac{1}{6^n}$. When we consider such we can pick any of $n-5$ starting point for the subsequence 123456, the rest of the sequence doesn't matter and each term can be any number of $\left\{1,2,3,4,5,6\right\}$ so we've got $6^{n-6}$ ways to pick the remaining numbers. So following this reasoning, I would be tempted to say that the wanted probability is $\frac{(n-5)6^{n-6}}{6^n} = \frac{n-5}{6^6}$ which obviously doesn't make sense because I could take an $n$ that's strictly greater than $6^6$ and it would be a probability strictly greater than $1$...

I really struggle with combinatorics like that and I never know why my reasoning is wrong.

4

There are 4 best solutions below

0
On

Let $p(n)$ be the probability of having 123456 in $n$ rolls. Then, we have the following recurrence by conditioning on the first time that the sequence appears.

For $\color{blue}{6\le n \le 11}$:

$$p(n)=\color{blue}{(n-5) \left ( \frac{1}{6} \right ) ^6}$$

For $\color{blue}{n \ge 12}$:

$$p(n)=(11-5) \left ( \frac{1}{6} \right )^6+\sum_{i=7}^{n-5}(1-p(i-1)) \left ( \frac{1}{6} \right ) ^6= \color{blue}{\left ( \frac{1}{6} \right ) ^6 \left [n-5-\sum_{i=6}^{n-6}p(i) \right ]}$$

The above can be used to efficiently compute the probability $p(n)$.

From the above, for $n \ge 12$, one can obtain another recurrence (obtained in another answer based on a different reasoning):

$$p(n)=p(n-1)+\left ( \frac{1}{6} \right ) ^6 \left [1-p(n-6) \right ]$$

as follows:

$$p(n)-p(n-1)=\left ( \frac{1}{6} \right ) ^6 \left [n-5-\sum_{i=6}^{n-6}p(i)- \left( (n-1)-5-\sum_{i=6}^{n-1-6}p(i) \right ) \right ]=\left ( \frac{1}{6} \right ) ^6 \left [1-p(n-6) \right ].$$

More interestingly, using the recurrence, the probability can be determined in each range $6k\le n < 6(k+1)$ for $k\in \mathbb N$. For $k=1$, $6\le n < 12$, we have $$p(n)=(n-5) \left ( \frac{1}{6} \right ) ^6.$$

Let us set $k=2$ and find the formula of $p(n)$ for $\color{blue}{12\le n < 18}$:

$$p(n)=\left ( \frac{1}{6} \right ) ^6 \left [n-5-\sum_{i=6}^{n-6}p(i) \right ]=\left ( \frac{1}{6} \right ) ^6 \left [n-5-\left ( \frac{1}{6} \right ) ^6\sum_{i=6}^{n-6}(i-5) \right ]=\color{blue}{\left ( \frac{1}{6} \right ) ^6 \left [n-5-\left ( \frac{1}{6} \right ) ^6\frac{(n-10)(n-11)}{2} \right ]}.$$

This can be similarly used to obtain the formula for $k=3$ (and the next values of $k$) as we have the formulas for $p(6), \dots, p(11)$ and $p(12), \dots, p(17)$.

This procedure reveals that for $6k\le n < 6(k+1)$ the probability $p(n)$ is a polynomial of max order $k$:

$$p(n)=a_0+a_1n+\dots+ a_k n^k.$$


We could also use inclusion-exclusion, but the result is complex (see the closed-form formula obtained in another answer).

I doubt that a simple formula can be presented for $p(n)$ for $6k\le n < 6(k+1)$ for any $k \in \mathbb N$.

Update 1:

I realized that for $\color{orange}{n \ge 12 }$, the following is a very good approximation of $p(n)$:

$$p(n)=\color{orange}{\left ( \frac{1}{6} \right ) ^6 \left [n-5-\left ( \frac{1}{6} \right ) ^6\frac{(n-10)(n-11)}{2} \right ]}.$$

For $\color{orange}{n \le 100}$, the maximum error is less than $\color{orange}{9.72195267432557\text{E-}10}$

For $\color{orange}{n \le 1000}$, the maximum error is less than $\color{orange}{1.55551907611573E\text{E-}6}$.

For $\color{orange}{n \le 10000}$, the maximum error is less than $\color{orange}{0.00154961233453441}$.

For example, for $n=100$, $p(100)$ is $\color{blue}{0.00203434079881172}$, while the approximate value from the above formula is $\color{orange}{0.00203433982661645}$.

As another example, for $n=10000$, $p(10000)$ is $\color{blue}{0.192855678224992}$, while the approximate value from the above formula is $\color{orange}{0.191306065890458}$.

Update 2:

Finally, using a numerical study, I could obtain $\color{red}{\text{the exact general formula}}$ of $p(n)$ for $6k\le n < 6(k+1)$ for any $k \in \mathbb N$:

$$p(n)=\left ( \frac{1}{6} \right ) ^6 (n-5) -\left ( \frac{1}{6} \right ) ^{2 \times 6}\frac{(n-10)(n-11)}{2!}+\left ( \frac{1}{6} \right ) ^{3 \times 6}\frac{(n-15)(n-16)(n-17)}{3!} \dots (-1)^{k-1}\left ( \frac{1}{6} \right ) ^{k \times 6}\frac{(n-(6k-k))\dots (n-(6k-1))}{k!}.$$

$$\fbox{$p(n)=\sum_{i=1}^{k} (-1)^{i-1}\left( \frac{1}{6} \right ) ^{6i}\,\frac{\prod_{j=1}^{i}(n-(6i-j))}{i!}, \quad 6k\le n < 6(k+1).$}$$

In fact, this could be aldo obtained from the formula given in another answer using inclusion-exclusion after some simplifications.

4
On

Let us denote by $f(n)$ the number of sequences of $n$ die rolls where the subsequence $123456$ occurs. We can deduce the following recurrent relation for $f(n)$: $$f(n)=6f(n-1)+(6^{n-6}-f(n-6)).$$ Indeed, if the subsequence $123456$ occurs in the first $n-1$ then we should count this sequence in, and it can be prolonged with any of the $6$ die rolls. Otherwise the last $6$ rolls must be $123456$ and there is no $123456$ in the first $n-6$ rolls.

This formula is already good enough. We have $f(0)=$ $f(1)=$ $f(2)= $ $f(3)=$ $f(4)=$ $f(5)=0$. Then one by one we can calculate $f(n)$ for any $n$. The answer to your question will be $f(n)/6^n$. This number grows quite slowly. For example, if $n=100$ then the answer is just barely above $2$ per mille: $\approx 0.002034340798812$.

Now there is actually a way to solve this recurrence. I’ve tried doing so, but free access Wolfram Alpha won’t let me compute this. I can however describe the maths. Consider the equation $$x^6-6x^5+1=0.$$ It has $6$ different (complex) roots, their approximate values are: $$x_1\approx 0.716838, x_2 \approx 5.99987,$$ $$ x_3\approx -0.55999-0.39661i, x_4\approx -0.55999+0.39661i, $$ $$x_5\approx 0.20164-0.67313i,x_6\approx 0.20164+0.67313i.$$

Now consider this system of $6$ linear equations with $6$ unknowns $c_1, c_2, c_3, c_4$, $c_5$, $c_6$: $$\sum_{i=1}^6 c_ix_i^j + 6^j = 0, \quad j = 0,1,2,3,4,5.$$

Unfortunately, this system of equations is quite large and Wolfram Alpha won’t accept the input. If we chop the numbers, the calculation accuracy falls drastically. If this system is solved though, you have the answer for $f(n)$ for any $n$: $$f(n)= \sum_{i=1}^6 c_ix_i^n + 6^n.$$

Edit: So I managed to solve the linear equations system. So here is a good approximation formula for the answer: $$ f(n)/6^n = 1+ (0.028194 \cdot 0.716838^n+ (-1.000108)\cdot 5.99987^n+ $$ $$+(-0.017734-0.010752i)\cdot(-0.55999-0.39661i)^n + $$ $$+(-0.017734+0.010752i)\cdot(-0.55999+0.39661i)^n + $$ $$ + (0.003691-0.02415i)\cdot(0.20164-0.67313i)^n + $$ $$+(0.003691+0.02415i)\cdot(0.20164+0.67313i)^n )/6^n.$$

0
On

If you denote $p_n=\mathbb{P}(\text{your event $n$ rolls})$, you can construct your recursion this way ($X_i$ is $i-th$ dice roll): \begin{equation} \mathbb{P}(\text{your event $n$ rolls})=\mathbb{P}(\text{your event $n$ rolls}|X_1=1)+...+\mathbb{P}(\text{your event $n$ rolls}|X_1=6) \end{equation} In case that $X_1\neq 1$ we have $\mathbb{P}(\text{your event $n-1$ rolls})=p_{n-1}$. Otherwise, we have $X_1=1$ and: \begin{equation} \mathbb{P}(\text{your event $n$ rolls})=\mathbb{P}(\text{your event $n$ rolls}|X_1=1)+5p_{n-1} \end{equation} We just need to find the probability with the condition $X_1=1$. This can be done similarly: \begin{equation} \mathbb{P}(\text{your event $n$ rolls}|X_1=1)=\mathbb{P}(\text{your event $n$ rolls}|X_1=1,X_2=2)+\mathbb{P}(\text{your event $n$ rolls}|X_1=1,X_2=1)+\mathbb{P}(\text{your event $n$ rolls}|X_1=1|X_2=3)+...\mathbb{P}(\text{your event $n$ rolls}|X_1=1,X_2=6) \end{equation} Which leaves us with $p_{n-2}$ for every case that $X_2 \neq 2$, since we have $n-2$ rolls left. \begin{equation} \mathbb{P}(\text{your event $n$ rolls}|X_1=1)=\mathbb{P}(\text{your event $n$ rolls}|X_1=1,X_2=2)+5p_{n-2} \end{equation} By continuing the process you will model the probability with recursion. It is obvious that $p_1=p_2=p_3=p_4=p_5=0$ and $p_6=\frac{1}{6^6}$.

0
On

The inclusion-exclusion approach isn't that bad, since multiple occurrences can't overlap. If there are $k$ occurrences of the subsequence $123456$, then there are $n-6k$ positions remaining; insert the $k$ occurrences as $k$ distinguished positions, giving $n-5k$ positions in total, of which $k$ are distinguished and may occur anywhere in the sequence. So the number of sequences with $k$ specific occurrences and no constraint on the remaining positions is ${{n-5k}\choose{k}}6^{n-6k}$, for a probability contribution of $$ \frac{{n-5k}\choose{k}}{(6k)^6}=\frac{(n-5k)!}{(n-6k)!k!6^{6k}}. $$ The overall exact answer, then, is $$ p(n)=\sum_{k=1}^{\lfloor{n/6}\rfloor}\frac{(-1)^{k+1}(n-5k)!}{(n-6k)!k!6^{6k}}. $$ Note that the number of terms needed to get an accurate result is initially very small, but grows with $n$... in general the terms start shrinking once $k \gg n / 6^6$. I've compared this to the two other answers and found it agrees. In particular, the exact answer for $n=100$ is $$ \frac{341839694035418792417464432131969563500276811405811130807601584865147825}{168034625385820706300589060483039562895355892611356705643794230033306877952}. $$