Reinforcement learning objective as conditional expectations

55 Views Asked by Bumbble Comm At 26 Mar 2026 - 7:37

In one of his lectures Levine describes the objective of reinforcement learning: $$J(\tau) = E_{\tau\sim p_\theta(\tau)}[r(\tau)]$$ where $\tau$ refers to a single trajectory and $p_\theta(\tau)$ is the probability of having taken that trajectory so that $p_\theta(\tau) = \prod_{t = I}^T \pi_{\theta}(a_t, s_t)p(s_{t+1}|s_t, a_t))$.

Starting from this definition, he writes the objective as $J(\tau) =\sum_{t=1}^T E_{(s_t, a_t)\sim p_\theta(\tau)}[r(s_t, a_t)]$ and argues that this sum can be decomposed by using conditional expectations, so that it becomes:

$$J(\tau) = E_{s_1 \sim p(s_1)}[E_{a_1 \sim \pi(a_1|s_1)}[r(s_1, a_1) + E_{s_2 \sim p(s_2|s_1, a_1)}[E_{a_2 \sim \pi(a_2|s_2)}[r(s_2, a_2)] + ...|s_2]|s_1,a_1]|s_1]]$$

Can anyone explain this last step?

Original Q&A

Reinforcement learning objective as conditional expectations

Related Questions in STATISTICS

Related Questions in CONDITIONAL-EXPECTATION

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions