I've been watching Reinforced learning lectures by David Silver. In the second lectures at 32:19 https://www.youtube.com/watch?v=lfHX2hHRMVQ he mentions Bellman's equation for Markov Decision Process, which is the following: $v(s) = \mathbb{E}(R_{t+1} + \gamma v(S_{t+1})|S_{t}=s ) $. It is said that it can be derived from "the law of iterated expectations" which states that $\mathbb{E}(\mathbb{E}(X|Y)) = \mathbb{E}(X)$. Frankly I don't see how these two are connected. In the case of Bellman's equation we would like to show that $\mathbb{E}(\mathbb{E}(R_{t+2} +....|S_{t+1})|{S_t}) = \mathbb{E}(R_{t+2} +...|S_t) $ and I feel that we need to assume some additional properties of $R_{n}$ but none were given. How can one prove this equality?
2026-04-04 11:23:09.1775301789
Bellman's equation for Markov Decision Process
479 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in PROBABILITY-THEORY
- Is this a commonly known paradox?
- What's $P(A_1\cap A_2\cap A_3\cap A_4) $?
- Another application of the Central Limit Theorem
- proving Kochen-Stone lemma...
- Is there a contradiction in coin toss of expected / actual results?
- Sample each point with flipping coin, what is the average?
- Random variables coincide
- Reference request for a lemma on the expected value of Hermitian polynomials of Gaussian random variables.
- Determine the marginal distributions of $(T_1, T_2)$
- Convergence in distribution of a discretized random variable and generated sigma-algebras
Related Questions in MACHINE-LEARNING
- KL divergence between two multivariate Bernoulli distribution
- Can someone explain the calculus within this gradient descent function?
- Gaussian Processes Regression with multiple input frequencies
- Kernel functions for vectors in discrete spaces
- Estimate $P(A_1|A_2 \cup A_3 \cup A_4...)$, given $P(A_i|A_j)$
- Relationship between Training Neural Networks and Calculus of Variations
- How does maximum a posteriori estimation (MAP) differs from maximum likelihood estimation (MLE)
- To find the new weights of an error function by minimizing it
- How to calculate Vapnik-Chervonenkis dimension?
- maximize a posteriori
Related Questions in MARKOV-PROCESS
- Definition of a Markov process in continuous state space
- What is the name of the operation where a sequence of RV's form the parameters for the subsequent one?
- Given a probability $p$, what is the upper bound of how many columns in a row-stochastic matrix exceed $p$?
- Infinitesimal generator of $3$-dimensional Stochastic differential equation
- Controlled Markov process - proper notation and set up
- Easy way to determine the stationary distribution for Markov chain?
- Why cant any 3 events admit Markov Property?
- Absorbing Markov chain and almost sure convergence
- Transition probabilities for many-states Markov model
- How to derive a diffusion tensor and stationary states given a Markov process transition matrix?
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
Law of iterated expectations is a powerful result in probability theory and it does not depend on $R_n$ at all. If you view conditional expectation as some form of projection, then what law of iterated expectation tells you is that if one space contains the other, then it does not really matter what order you project the variables onto these spaces, and you will be exactly at the same place as if you had projected initially to the smaller subspace.
I don't know how technical of a proof you want, but there is a proof in Durrett's Probability Theory & Examples (PTE) in Chapter 5. This requires some measure theory though, and I am not sure how you can write a formal proof without some measure theory.
(If you don't know any measure theory, this intuition might be more helpful: My best estimate for $R_i$ given $S_{i+1}$ is some random quantity, and if I average that over what I know according to $S_i$, I basically am still getting the best estimate as if I had just been given $S_i$.)