Understanding the Bellman Expectation Equation for the value function of MRPs

71 Views Asked by At

I am trying to understand the following equalities given in my reinforcement learning notes, which supposedly use the law of total expectation, given by (the first line by definition) to express the value function according to the Bellman Expectation Equation (the last line) for Markov Reward Processes

$$ v(s) = \mathbb{E}[G_t | S_t = s]$$ $$ = \mathbb{E}[R_{t+1} + \gamma R_{t+2} + \gamma^2 R_{t+3} + \ldots|S_t = s]$$ $$ = \mathbb{E}[R_{t+1} + \gamma (R_{t+2} + \gamma R_{t+3} + \ldots)|S_t = s]$$ $$ = \mathbb{E}[R_{t+1} + \gamma G_{t+1}|S_t = s]$$ $$ = \mathbb{E}[R_{t+1} + \gamma v(S_{t+1})|S_t = s]$$

where $s$ is a state, $G_t$ is the total discounted reward from time-step $t$, $R_k$ is a return at time-step $k$, and $\gamma$ is the discount factor.

What I don't understand is why $$\mathbb{E}[R_{t+1} + \gamma (G_{t+1})|S_t = s]$$ $$ = \mathbb{E}[R_{t+1} + \gamma v(S_{t+1})|S_t = s]$$

Now the law of total expectation states for two random variables $X,Y$ defined on the same probability space that $\mathbb{E}(\mathbb{E}(X|Y)) = \mathbb{E}(X)$, where the inner expectation is taken over $x \sim X$, and the outer expectation is taken over $y \sim Y$.

By the linearity of expectation, I think what I have to show is that (apologies for the messy/potentially wrong notation) $$\gamma \mathbb{E} [\mathbb{E} [G_{k+1}|S_{k}=S_{t+1}]|S_{t}=S]$$ but I'm not sure how to relate this precisely to the law of total expectation, or if I have my indices correct.