My question is in regards to this surrogate advantage function used in the Trust Region Policy Optimization (TRPO) reinforcement learning algorithm. $${L}(\theta_k, \theta) = \mathop{\mathbb{E}}_{s,a \sim \pi_{\theta_k}}{ \frac{\pi_{\theta}(a|s)}{\pi_{\theta_k}(a|s)} A^{\pi_{\theta_k}}(s,a) }$$ Is this expectation over all (state, action) pairs in the policy ${\pi_{\theta_k}}$ of $\frac{\pi_{\theta}(a|s)}{\pi_{\theta_k}(a|s)} A^{\pi_{\theta_k}}(s,a)$ equivalent to: $$\mathop{\mathbb{E}}_{s \sim \pi_{\theta_k}}[\mathop{\mathbb{E}}_{a \sim \pi_{\theta_k}}{ \frac{\pi_{\theta}(a|s)}{\pi_{\theta_k}(a|s)} A^{\pi_{\theta_k}}(s,a) }]$$ Would this be equivalent to a double integral, or is there something different between expecting over ${s,a}$ pairs and expecting over them individually?
2026-04-19 20:41:25.1776631285
Is an expectation over two variables equivalent to the expectation over the first of the expectation over the second?
36 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
Yes, the Law of total expectation says that if you have two random variables $X$ and $Y$, potentially with some dependency between them, then the overall expectation of $X$ can be found by first taking the expectation conditional on $Y$ then taking the expectation over $Y$. In other words,
$$\mathbb{E}[X] = \mathbb{E}_Y \left[ \mathbb{E}_X[X|Y] \right]$$