Bellman's equation for Markov Decision Process

479 Views Asked by At

I've been watching Reinforced learning lectures by David Silver. In the second lectures at 32:19 https://www.youtube.com/watch?v=lfHX2hHRMVQ he mentions Bellman's equation for Markov Decision Process, which is the following: $v(s) = \mathbb{E}(R_{t+1} + \gamma v(S_{t+1})|S_{t}=s ) $. It is said that it can be derived from "the law of iterated expectations" which states that $\mathbb{E}(\mathbb{E}(X|Y)) = \mathbb{E}(X)$. Frankly I don't see how these two are connected. In the case of Bellman's equation we would like to show that $\mathbb{E}(\mathbb{E}(R_{t+2} +....|S_{t+1})|{S_t}) = \mathbb{E}(R_{t+2} +...|S_t) $ and I feel that we need to assume some additional properties of $R_{n}$ but none were given. How can one prove this equality?

1

There are 1 best solutions below

6
On

Law of iterated expectations is a powerful result in probability theory and it does not depend on $R_n$ at all. If you view conditional expectation as some form of projection, then what law of iterated expectation tells you is that if one space contains the other, then it does not really matter what order you project the variables onto these spaces, and you will be exactly at the same place as if you had projected initially to the smaller subspace.

I don't know how technical of a proof you want, but there is a proof in Durrett's Probability Theory & Examples (PTE) in Chapter 5. This requires some measure theory though, and I am not sure how you can write a formal proof without some measure theory.

(If you don't know any measure theory, this intuition might be more helpful: My best estimate for $R_i$ given $S_{i+1}$ is some random quantity, and if I average that over what I know according to $S_i$, I basically am still getting the best estimate as if I had just been given $S_i$.)