From "An introduction to Stochastic Modeling" by Pinsky and Karlin:
Let $T = \min \{n \ge 0 : X_n \ge r\}$ where $\{X_n\}$ is a Markov chain with transient states $0, 1, \dots, r-1$ and absorbing states $r, r+1, \dots, N$. Define $w_i = E[\sum_{n=0}^{T-1}g(X_n) | X_0 = i]$, where $g$ is some function. Then $$w_i = g(i) + \sum_{j=0}^{r-1}P_{ij}w_j, \text{ for } i = 0, \dots, r-1.$$
I have no idea how to show this axiomatically. I start with conditioning on the first step.
$$E[\sum_{n=0}^{T-1}g(X_n) | X_0 = i] = \sum_{n=0}^{N} E[\sum_{k=0}^{T-1}g(X_n) | X_1 = k, X_0 = i]P_{ik}$$
Then, for $k$, a transient state: $$E[\sum_{k=0}^{T-1}g(X_n) | X_1 = k, X_0 = i] =$$ $$ \sum_{x_2, x_3, \dots, x_{T-1}} (g(i) + g(k) + g(x_2) + \dots + g(x_{T-1}))P(X_{T-1} = x_{T-1}, \dots, X_2 = x_2 | X_1 = k, X_0 = i)$$
The Markov property implies:
$$= \sum_{x_2, x_3, \dots, x_{T-1}} (g(i) + g(k) + g(x_2) + \dots + g(x_{T-1}))P(X_{T-1} = x_{T-1}, \dots, X_2 = x_2 | X_1 = k)$$
But I can't figure out an axiomatic way to show that this equals
$$= \sum_{x_2, x_3, \dots, x_{T-1}} (g(i) + g(k) + g(x_2) + \dots + g(x_{T-1}))P(X_{T-1} = x_{T-1}, \dots, X_1 = x_2 | X_0 = k).$$
It seems to make sense intuitively since being at state $k$ and moving to an absorption state is the same as starting from $k$, but I can't derive this from the axioms of probability and the properties of Markov chains.
Anyone have any ideas?
What you need to do after conditioning on the first step is to split the sum, so as to "peel off" the $g(i)$ term. This looks like:
$$E \left [ \sum_{n=0}^{T-1} g(X_n) \mid X_0=i \right ] = \sum_{j=0}^N P_{ij} E \left [ \sum_{n=0}^{T-1} g(X_n) \mid X_0=i,X_1=j \right ] \\ = g(i) + \sum_{j=0}^N P_{ij} E \left [ \sum_{n=1}^{T-1} g(X_n) \mid X_0=i,X_1=j \right ]$$
since $\sum P_{ij}=1$. Now the Markov property lets you throw away the condition $X_0=i$, because $X_1,\dots,X_{T-1}$ and $T$ are conditionally independent of $X_0$ given $X_1$. The last step is to use time homogeneity to show that $E \left [ \sum_{n=1}^{T-1} g(X_n) \mid X_1=j \right ]=w_j$. This is intuitively kind of obvious, as you are just identifying time $1$ as the initial time and otherwise doing the same thing, but since the definition of time homogeneity only involves deterministic times, you have some mess with conditioning on $T$ to finish writing it out.