A system has two engines. When both engines are working, the failure rate of each engine is $1$ failure per $10$ days. When there is only one engine working, the failure rate is $1$ failure per $6$ days (since the single engine must work harder to compensate). Engines are repaired one at a time with a rate of $1$ repair per day. Assume all failure times and repair times are exponentially distributed and are also independent.
(a) Find the expected time to reach the failure state, starting from a point where both engines are working?
(b) If the system is currently failed (aka, both machines not working), find the probability that the system reaches the "both-working" state BEFORE returning to the failed state?
My attempt: (a) From the given information, the transition rate matrix is: $$Q =\pmatrix{\frac{-1}{5} & \frac{1}{5} & 0\\ 1 & \frac{-7}{6} & \frac{1}{6}\\ 0 & 2 & -2}$$ Thus the probability matrix is $$P =\pmatrix{0 & 1 & 0\\ \frac{6}{7} & 0 & \frac{1}{7}\\ 0 & 1 & 0}$$ Now, solving for $p_0$ - which is the fraction of time of being in the failed state - from the equation $PQ=0$, we get: $p_0 = \frac{1}{73}$. We also find $\pi_0 = \frac{3}{7}$ - the long-run probability of being in the failed state in the long run - from solving the equation: $\pi P =\pi$. Thus the expected time to reach the failure state is $p_0\pi_0 = \fbox{$\frac{3}{511}$}$.
(b) To move from the current failed state to "both-working" state BEFORE returning to the failed state means the system has to go from state 1 (one of the machine is working) to state 2 (both machine works), since the system moves from failed state to state 1 with probability $1$. Thus, it has to reach state 2 at the next step (otherwise, it would return to the failed state BEFORE reaching state 2), so the required probability is $\fbox{$\frac{6}{7}$}$.
My question: I wonder if my solution to part (a) is correct, since $\pi_0$ also accounts for probability of moving from state 1 (one of the machine is working) to the failure state, so it might be more than just $\fbox{$\lim_{n\rightarrow \infty} P^{(n)}_{20}$}$. I think my solution to part (b) is correct, but if I'm wrong, please help point out the reason why.
For problem (a):
Let $x$ be the expected time to double failure. The expected time to the state of one engine working is $5$ (everything here is in days). From the state of one engine working, the expected time to leave that state is $\frac67$, and when you leave, the probability that you have left into the total failure state is $\frac17$. If you leave into the no-failure state, you have to start again. So... $$ x = 5+\frac17\left(\frac67\right) + \frac67\left(\frac67 + x\right) $$ The solution is $x = \frac{291}{7}$ or just over $41\frac12$ days. I really don't see how you could think the expected time to a double failure was less than $\frac1{80}$ of a day.
For problem (b):
From the doubly failed state you will certainly next reach the state of one engine working. From there, as in part (a), there is a $\frac17$ chance of returning to the doubly failed state, and a $\frac67$ chance of transitioning to the both-working state. So the answer is $\frac67$. (Which is the answer you arrived at.)
You can actually obtain the eigenvalues and eigenvectors of the transition matrix analytically without solving a cubic equation since the constant term in the characteristic equation is zero. Using those eigenvectors as new variables, you can get the probability of each state as a function of time, in closed form. But the two questions you were asked have much easier solutions, as above.