Generalized $\texttt{ABRACADABRA}$ Problem

435 Views Asked by At

Consider the classic problem of a monkey hitting random keys on a keyboard, each with equal probability, and calculating the expected time for the monkey to type $\texttt{ABRACADABRA}$ (if unfamiliar with the martingale solution see, e.g., this question).

Is there a straightforward way to extend the solution to an arbitrary alphabet, arbitrary distribution of hitting the keys on the alphabet, and arbitrary string?

Edit

Based on Misha Lavrov's and user6247850's answers and comments. Let $A$ be an alphabet, $p$ be a probability distribution over the characters of this alphabet and $S$ be a string, with $s_i$ the $i$-th character. Let $p(s_i)$ be the probability of hitting this character.

I define $L_n(S)$ the left substring of $S$ of length $n$ by the first $n$ characters of $S$, and analogously $R_n(S)$ the right substring of $S$ of length $n$ by the last $n$ characters of $S$.

The expectation of the stopping time $T_S$ will be:

$$\mathbb{E}(T_S) = \sum_{i=1}^{\#S}\left(\prod_{j=1}^i p(s_j)^{-1}\right)\mathbf{1}(R_i(S) = L_i(S)),$$

where $\mathbf{1}$ denotes the indicator function. Is this correct?

2

There are 2 best solutions below

0
On BEST ANSWER

The generalization to arbitrary alphabet and arbitrary string does not take much additional work. For an alphabet of size $k$ and an arbitrary string $s_1 s_2 \dots s_t$:

  • once again we have a gambler appear before every keystroke and wager $\$1$ on the next letter being $s_1$, winning $\$k$ if they are right.
  • once again, every gambler that wins proceeds to bet all their money on the next letter being $s_2$; then bet all their money on the next letter being $s_3$, and so on.

The bets are all fair (expected value $0$). After $N$ keystrokes, there are $N$ gamblers, who start with $\$N$ total, so the expected amount they end with is also $\$N$. This remains the same in expectation if $N$ is a stopping time, such as the time when the string $s_1s_2 \dots s_t$ first appears.

At this stopping time, one lucky gambler has won $\$k^t$. Also, for all $0<i<t$, if the string's first $i$ letters are equal to its last $i$ letters (if $s_1 s_2 \dots s_i = s_{t-i+1} s_{t-i+2} \dots s_t$) then another gambler has won $\$k^i$. This tells us the total amount of money anyone has at time $N$, so it must also tell us the expected value of time $N$.

Some special cases:

  • The word $\texttt{ALFALFA}$ first appears at time $k^7 + k^4 + k$ in expectation.
  • The word $\texttt{AAA...AA}$ of length $t$ first appears at time $k^t + k^{t-1} + \dots + k = \frac{k^{t+1}-k}{k-1}$ in expectation.
  • Most words (including, but not limited to words with no repeated letters) first appear at time $k^t$ in expectation.

To generalize to arbitrary probabilities, we need to adjust the payoffs to make the game fair. Suppose that the string is still $s_1 s_2 \dots s_t$, but now letter $s_i$ has probability $p_i$ of occurring. Betting on a letter with probability $p$ should get a gambler $\frac1p$ dollars for every dollar they bet. Therefore the final gambler's winnings are $(p_1 p_2 \dotsm p_t)^{-1}$, and for every overlap $s_1 s_2 \dots s_i = s_{t-i+1} s_{t-i+2} \dots s_t$, there is another gambler with winnings $(p_1 p_2 \dotsm p_i)^{-1}$. The sum of these is the expected time until the string appears.

2
On

Yes, just adjust the payoff for each letter to make the wagers fair. For example, if the probability of hitting $a$ is $p_a$, then the payoff for betting $\$1$ on $a$ should be $\frac{1}{p_a}$. The analysis for finding the total payoff works the same, but is probably slightly messier. For example, if the string is "abab," the first gambler wins $\frac{1}{p_a^2p_b^2}$ and the gambler that started with the second $a$ wins $\frac{1}{p_a p_b}$, so the expected time before "abab" is spelled is $\frac{1}{p_a^2p_b^2} + \frac{1}{p_a p_b}$.