I was considering the following problem, from Tijms's Understanding Probability:
A fair die is repeatedly rolled. What is the expected value of the number of rolls until a run of six different outcomes appears and what is the probability of getting such a run within 100 rolls of the die? What are the answers when restricting to the run 123456?
However, while solving the problem I noticed the following thing.
Consider the second part of the problem: we would like to find the probability of getting the run 123456 in 100 rolls of a die.
I introduce the following states:
- state 0: we have a sequence which is not useful towards our goal and that does not end in 1;
- state 1: we have a sequence which is not useful towards our goal and that ends in 1;
- state 2: we have a sequence which ends in 12;
- state $j$, $1\le j\le 5$: we have a sequence which ends in 12...$j$;
- state 6: we reached our goal.
Clearly, state 6 should be an absorbing state. I computed the following transition probability matrix:
$$ P = \begin{bmatrix} 5/6 & 1/6 & 0 & 0 & 0 & 0 & 0\\ 4/6 & 1/6 & 1/6 & 0 & 0 &0 & 0\\ 4/6 & 1/6 & 0 & 1/6 & 0 & 0 & 0\\ 4/6 & 1/6 & 0 & 0 & 1/6 & 0 & 0\\ 4/6 & 1/6 & 0 & 0 & 0 & 1/6 & 0\\ 4/6 & 1/6 & 0 & 0 & 0 & 0 & 1/6 \\ 0 & 0& 0& 0 & 0 & 0 & 1 \end{bmatrix}. $$ To answer the first question, I compute the entry in the first row, last column of $P^{100}$, which gives a probability of 0.002034. This number agrees with the solutions provided in the book.
To answer the second part of the problem, I solve the linear system:
$$ (\mathrm{Id} - P_{1:6,1:6})(e_0,e_1,e_2,e_3,e_4,e_5) = (1, 1, 1, 1, 1, 1), $$ where $\mathrm{Id}$ is the $6\times6$ identity matrix, and $P_{1:6,1:6}$ is the matrix obtained by removing the last row and the last column from $P$. My understanding is that $e_i$ is the expected number of rolls for reaching state 6 starting from state $i$.
The numbers I obtain are: $$ \begin{gathered} e_0 = 46656 && e_1=46650, && e_2= 46620 && e_3=46440 && e_4=45360 & e_5= 38880. \end{gathered} $$
The first thing that puzzles me is that the book says the answer should be $e_0=46659$.
The second is that the numbers look strange. For example, if we start from state 4, do we really have such a small advantage over starting from scratch (only about 1000 rolls less than state 0)?
For completeness I answered the first question with by considering only states 1:6 that I mentioned above (there is no "useless" state here), and with transition probability matrix:
$$ P = \begin{bmatrix} 1/6 & 5/6 & 0 & 0 & 0 & 0 \\ 1/6 & 1/6 & 4/6 & 0 & 0 & 0 \\ 1/6 & 1/6 & 1/6 & 3/6 & 0 & 0 \\ 1/6 & 1/6 & 1/6 & 1/6 & 2/6 & 0 \\ 1/6 & 1/6 & 1/6 & 1/6 & 1/6 & 1/6 \\ 0 & 0 & 0 & 0 & 0 & 1 \end{bmatrix}, $$ so that $P^{99}$ gives the required probability: 0.70537, and the expected number of rolls starting from state $i$, $1\le i\le 5$ are: $$ \begin{gathered} e_1 =82.2 && e_2=81 && e_3=79.2 && e_4=75.6 && e_5=64.8 \end{gathered} $$
I get exactly the same answers you do.
I don't find it so surprising that there isn't all that much advantage in being close to the goal, since the most likely thing to happen, by far, is that we will go back to the beginning.
If you think about setting up a system of linear equations to solve for the expectations you get for example $$e_0=1+\frac56e_0+\frac16e_1$$ since we always have to roll once, and with probability $\frac56$ we make no progress and with probability $\frac16$ we got to state $1$. This gives $e_0=6+e_1$, so that there's hardly any gain from getting started.
Of course, this is the same system of equations you wrote in matrix form, but I somehow all ways find it more concrete when I write out like they showed me in high school.