A random process has three possible outcomes: $A$, $B$, and $C$. At each step, the outcome is decided randomly, and is uncorrelated with previous outcomes. The outcomes occur with probabilities $p_A$, $p_B$ and $p_c$ (of course, $p_A + p_B + p_C = 1$).
In a sequence of length $N$ generated using this process, what is the probability that the an unbroken run of two of the variables of length $k$ or more (e.g. $ABABABA$ for $k=7$) occurs somewhere in the sequence at least once?
What should be the length of the sequence such that the probability of the run occurring is greater than or equal to $\frac{1}{2}$?
(Is this even a tractable problem? I am a beginner in the area, but from what I have seen, things start to get rather nasty quite quickly, even for runs in binary processes.)
Call $X_n$ the length of the longest prefix of the word you are interested in which ends the sequence at time $n$. For example, the word ABABABA and the sequence ABCACBCAB yield $(X_n)_{0\leqslant n\leqslant9}$ equal to $0120100012$.
Let us continue with this specific word ABABABA in mind. The question is to estimate $\mathbb P(T\leqslant n)$ for some fixed $n$, where $T=\inf\{n\geqslant0\mid X_n=7\}$. The key fact is that $(X_n)_{n\geqslant 0}$ is a Markov chain and that one can compute the generating function $\mathbb E(s^T)$.
Call $t_k(s)=\mathbb E(s^T\mid X_0=k)$ for $0\leqslant k\leqslant 7$. Then $t_7(s)=1$ and we look for $t_0(s)$. Examining the next letter, one sees that, if $X_n=k$, then $X_{n+1}$ is $k+1$ or $1$ or $0$. For example, $X_n=3$ means one just saw ABA. Thus $X_{n+1}$ is $4$, $1$, or $0$ when the next letter is B, A, or C. As a consequence, $$ t_3(s)=s(p_Bt_4(s)+p_At_1(s)+p_Ct_0(s)). $$ Proceeding like this for each $0\leqslant k\leqslant 7$, one gets for every fixed $s$ a Cramér system which $(t_k(s))_{0\leqslant s\leqslant 7}$ solves, with boundary condition $t_7(s)=1$. Solving this system yields $t_0(s)$ as a rational function of $s$, which one can then expand into a series $t_0(s)=\sum\limits_{k\geqslant7}a_ks^k$. This yields $$ \mathbb P(T\leqslant n)=\sum_{k\leqslant n}a_k. $$ Note finally that the behaviour of $\mathbb P(T\geqslant n)$ when $n\to\infty$ is ruled by the radius of convergence of the series $t_0(s)$, which can be read directly on the original Cramér system, without solving it. More precisely, the smallest positive root $\sigma$ of the determinant of this system is always at least $1$ and, when $n\to\infty$, $$ \mathbb P(T\geqslant n)=\sigma^{-n+o(n)}. $$