How many times will a consecutive sequence of throws randomly appear if I throw a four-sided die N times?

1.6k Views Asked by At

The problem is related to nucleotides and genomes, but it's better to simplify it using dice.

EDIT 2: My original question wasn't very clear, so I've reworded it

I know that the probability of getting the 7-digit long sequence "4213433" if I throw a four-sided die 7 times is $\frac{1}{4^7}$

But, what is the expected occurrence of the sequence "4213433" if I throw a four-sided die $4.5 \times 10^6$ times?

EDIT 1: Okay, I think I found my answer. If I throw a four-sided die $4.5 \times 10^6$ times, the sequence 4213433 will occur randomly $\frac{4.5 \times 10^6}{4^7}$ or 274.658 times. Is that correct? EDIT 3:Almost. Like @joriki said in the comments, the exact answer would be $\frac{4.5 \times 10^6 - 6}{4^7}$

3

There are 3 best solutions below

0
On BEST ANSWER

The answer you gave in your edit is not the answer to your original question. It's the (almost but not quite correct) answer to the question "What is the expected value of the number of sequences 4213433 occurring if I throw a four-sided die $4.5\times 10^6$ times?" This is in fact a much easier question because of the linearity of expectation. You don't have to worry about correlations, you can just add up the expectation values for each of the slots at which the sequence might occur. However, there aren't $4.5\times10^6$ of these, only $4.5\times10^6-6$, so the expected number of occurrences is actually $(4.5\times10^6-6)/4^7$, but that's the same up to the decimal places you specified.

Your original question is a bit harder, since to find the probability of at least one such sequence occurring, you need to take into account that the events of the sequence occurring in overlapping slots aren't independent. For instance, the probability of getting the sequence 44 at least once if you roll the die $3$ times is $7/4^3$, whereas the probability of getting the sequence $43$ at least once is $8/4^3$, even though the expected number of occurrences in both cases is $2/4^2$.

7
On

A combinatorial approach:
If you throw a four sided die $n$ times then you have a total of $4^n$ possible sequences. Now you want to count in how many of those the sequence "$4213433$" exists.
Denote $A_1$ the number of sequences in which the string exists at least once (with multiplicities - if it exists twice, we count it twice). Then $A_1=4^{n-7}(n-7+1)$ (You think of the string as a block, then you write any sequence of length $n-7$ and add the string in any of the $n-7+1$ spaces)
In the same way, denote by $A_j$ the number of sequences in which the string exists at least $j$ times (with multiplicities). Then $A_j=4^{n-7j}\binom{n-7j+1}{j}$.
Clearly $1\leq j\leq\frac{n}{7}$.
Now, the number of strings in which the string appears at least once (without multiplicities) is $$\sum_{j=1}^{\left\lfloor \frac{n}{7} \right\rfloor}(-1)^{j+1}A_j=\sum_{j=1}^{\left\lfloor \frac{n}{7} \right\rfloor}(-1)^{j+1}4^{n-7j}\binom{n-7j+1}{j}$$ The probability you desire is $$\frac{1}{4^n}\sum_{j=1}^{\left\lfloor \frac{n}{7} \right\rfloor}(-1)^{j+1}4^{n-7j}\binom{n-7j+1}{j}=\sum_{j=1}^{\left\lfloor \frac{n}{7} \right\rfloor}\frac{(-1)^{j+1}}{4^{7j}}\binom{n-7j+1}{j}$$

0
On

This does not directly answer your question, but I think it's relevant to note that, if $k$ is the the expected number of successful outcomes in $n$ independent trials, then the probability of getting at least one successful outcome in $n$ trials is approximately $1-\exp(-k)$, independent of $n$, and that this approximation is quite good as long as $n$ is large (say, over 10) and/or $\frac kn$ is small (say, less than 0.1).

Also, if $k$ itself is small (say, less than 0.1), the probability of at least one success is approximately equal to $k$.