Probability of a Specific Run Occurring in a Random Process

348 Views Asked by At

A random process has three possible outcomes: $A$, $B$, and $C$. At each step, the outcome is decided randomly, and is uncorrelated with previous outcomes. The outcomes occur with probabilities $p_A$, $p_B$ and $p_c$ (of course, $p_A + p_B + p_C = 1$).

In a sequence of length $N$ generated using this process, what is the probability that the an unbroken run of two of the variables of length $k$ or more (e.g. $ABABABA$ for $k=7$) occurs somewhere in the sequence at least once?

What should be the length of the sequence such that the probability of the run occurring is greater than or equal to $\frac{1}{2}$?

(Is this even a tractable problem? I am a beginner in the area, but from what I have seen, things start to get rather nasty quite quickly, even for runs in binary processes.)

2

There are 2 best solutions below

2
On BEST ANSWER

Call $X_n$ the length of the longest prefix of the word you are interested in which ends the sequence at time $n$. For example, the word ABABABA and the sequence ABCACBCAB yield $(X_n)_{0\leqslant n\leqslant9}$ equal to $0120100012$.

Let us continue with this specific word ABABABA in mind. The question is to estimate $\mathbb P(T\leqslant n)$ for some fixed $n$, where $T=\inf\{n\geqslant0\mid X_n=7\}$. The key fact is that $(X_n)_{n\geqslant 0}$ is a Markov chain and that one can compute the generating function $\mathbb E(s^T)$.

Call $t_k(s)=\mathbb E(s^T\mid X_0=k)$ for $0\leqslant k\leqslant 7$. Then $t_7(s)=1$ and we look for $t_0(s)$. Examining the next letter, one sees that, if $X_n=k$, then $X_{n+1}$ is $k+1$ or $1$ or $0$. For example, $X_n=3$ means one just saw ABA. Thus $X_{n+1}$ is $4$, $1$, or $0$ when the next letter is B, A, or C. As a consequence, $$ t_3(s)=s(p_Bt_4(s)+p_At_1(s)+p_Ct_0(s)). $$ Proceeding like this for each $0\leqslant k\leqslant 7$, one gets for every fixed $s$ a Cramér system which $(t_k(s))_{0\leqslant s\leqslant 7}$ solves, with boundary condition $t_7(s)=1$. Solving this system yields $t_0(s)$ as a rational function of $s$, which one can then expand into a series $t_0(s)=\sum\limits_{k\geqslant7}a_ks^k$. This yields $$ \mathbb P(T\leqslant n)=\sum_{k\leqslant n}a_k. $$ Note finally that the behaviour of $\mathbb P(T\geqslant n)$ when $n\to\infty$ is ruled by the radius of convergence of the series $t_0(s)$, which can be read directly on the original Cramér system, without solving it. More precisely, the smallest positive root $\sigma$ of the determinant of this system is always at least $1$ and, when $n\to\infty$, $$ \mathbb P(T\geqslant n)=\sigma^{-n+o(n)}. $$

1
On

The event of n repetitions of the pattern AB, followed by a breaking character B or C is $p_A^np_B^n(p_B+p_C)$. We sum over all $n \geq 4$. $\sum_{n=4}^{n=\infty} p_A^np_B^n(p_B+p_C)=p_A^4 p_B^4 (p_B+p_C)\sum_{n=0} ^{\infty} p_A^np_B^n=\frac{p_A^4 p_B^4 (p_B+p_C)}{1-p_Ap_B}$. For the odd case, where the run is broken at A, followed by an A or C, we have:$\frac{p_A^4 p_B^3 (p_A+p_C)}{1-p_Ap_B}$. The result is the sum of these two cases.