Expected value to generate given String?

93 Views Asked by At

Initially, we have an empty generating string, in each turn, we can append any character of Latin alphabets i.e from 'a' to 'z' with equal probability, What is the expected number of moves required to obtain string "abcd" as a suffix.

1

There are 1 best solutions below

0
On

The question is so easily stated, but I cannot find an easy solution. Here is one method.

Let $b$ be the number of characters (26 in this example) and $n$ be the number of specific characters to appear in order for the process to stop (4 in this example).

With $n$ characters in the string, the stopping probability $p_n=b^{-n}=a$. For fewer than $n$ characters, the probability is zero. For $n\le i\lt 2n$ the stopping probability $p_i=a$. For $i\ge 2n$ the stopping probability at $i$ diminishes because it becomes increasing unlikely to continue to add characters.

One way is to evaluate the probability that the process stops after $i$ characters are drawn, $p_i$ and then evaluate the expectation value E[I]. For $i\ge 2n$ the process stops provided there was no match for $i-n$ characters and the next $n$ characters are the ones that match the required pattern ("abcd" in this example).

Let $Y_j$ be the indicator function that the string of $n$ characters $[c_{j-n+1},c_{j-n+2},...,c_{j}]$ match the required pattern. Note that for this problem, the pattern consists of $n$ unique characters, so the joint probabilities $$P(Y_j Y_{j+1}) = P(Y_j Y_{j+2}) =...= P(Y_j Y_{j+n-1}) =0$$ and therefore the indicator functions are not independent for nearby indices.

Furthermore for $n\ge2$, $$\eqalign{ P(Y_j)&=a\\ P(\bar{Y}_j)&=1-a\\ P(Y_j \bar{Y}_{j+1})&=P(Y_j)-P(Y_j Y_{j+1})=a\\ P(\bar{Y}_j \bar{Y}_{j+1})&=P(\bar{Y}_j)-P(\bar{Y}_j Y_{j+1})=1-2a\\ }$$

Furthermore for $n\ge3$,

$$\eqalign{ P(\bar{Y}_j \bar{Y}_{j+1})&=P(\bar{Y}_j \bar{Y}_{j+1} \bar{Y}_{j+2})+P(\bar{Y}_j \bar{Y}_{j+1} {Y}_{j+2})\\ &=P(\bar{Y}_j \bar{Y}_{j+1} \bar{Y}_{j+2})+P({Y}_{j+2})\\ &=P(\bar{Y}_j \bar{Y}_{j+1} \bar{Y}_{j+2})+a\\ }$$

so, $$ P(\bar{Y}_j \bar{Y}_{j+1} \bar{Y}_{j+2}) = P(\bar{Y}_j \bar{Y}_{j+1}) -a = 1-3a $$

Now consider a long sequence, $i\ge2n$: $$\eqalign{ p_i &= a P(\bar{Y}_1 \bar{Y}_2 ... \bar{Y}_{i-n})\\ p_{i+1} &= a P(\bar{Y}_1 \bar{Y}_2 ... \bar{Y}_{i-n} \bar{Y}_{i-n+1})\\ \frac{p_{i+1}}{p_i}&=\frac{P(\bar{Y}_{i-2n+2} \bar{Y}_{i-2n+3} ... \bar{Y}_{i-n} \bar{Y}_{i-n+1})}{P(\bar{Y}_{i-2n+2} \bar{Y}_{i-2n+3} ... \bar{Y}_i )}\\ &=\frac{1-na}{1-(n-1)a} }$$ So, in summary,

$$ \eqalign{ p_i &= 0 &i<n \\ p_i &= a &n\le i<2n \\ p_{i+1}/p_i &= (1-na)/(1-(n-1)a) \ \ \ &i\ge 2n \\ } $$

The expectation turns out to be:

$$ E[I] = \sum_{i=0}^{\infty} i p_i = a \sum_{i=n}^{2n-1} i + a \sum_{i=2n}^{\infty} i\left[\frac{1-na}{1-(n-1)a}\right]^{i-2n+1} = 1/a + an(n-1)/2 $$

So for the problem posed, the expectations is about $26^4$.