Expectation value for the length of a randomly-generated sequence of letters

89 Views Asked by At

I'm trying to understand a solution that mathematician and science fiction author Greg Egan has posted on how to find an specific word in randomly typesetted letters. For instance, the word GAUGUIN: https://plus.google.com/113086553300459368002/posts/UWmyepBzvdd

Now it comes my brief explanation of Mr. Egan's reasoning, but probably you prefer to read him directly:

He established that state $k$ is the number of steps you have to take to go from $k$ correct letters to the number of total correct letters, let's say $w$. He decided to use $E_k$ to be the number of steps from state $k$ to state $w$, where $E_w = 0$. Then Mr. Egan stated that $T_{kq}$ is the matrix of transition probabilities of going from state $k$ to state $q$. So we have a Markov chain. The matrix is a $w$ x $w$ matrix, from 0 to $w-1$ (state $w$ is omitted because it can be deduced since all the states in a row sum up to 1).

What I can't understand is an equation he writes directly:

$$E_k = 1 + T_{kq}E_q $$

Why do we need to sum 1? Why that specific equation? Thanks.

P. S.: I know this problem was treated here: Expected time of sequence getting typed when the letters are typed randomly but Ii can't fully understand the solution showed and I want to understand Mr Egan's approach.

1

There are 1 best solutions below

3
On BEST ANSWER

One you reach a certain state $q$ from state $k$, you expect to take $E_q$ steps to reach state $w$.

$T_{kq}$ represents the probability to get to each state $q$ from state $k$. So $T_{kq}E_q$ is the expected number of steps (over all states $q$) get from (various ) state $q$ to state $w$. As an average, $T_{kq}E_q$ assumes that you start from state $k$, but counts the number of steps from state $q$. You need to sum $1$ because it takes $1$ step to go from state $k$ to state $q$.