Is an expectation over two variables equivalent to the expectation over the first of the expectation over the second?

36 Views Asked by At

My question is in regards to this surrogate advantage function used in the Trust Region Policy Optimization (TRPO) reinforcement learning algorithm. $${L}(\theta_k, \theta) = \mathop{\mathbb{E}}_{s,a \sim \pi_{\theta_k}}{ \frac{\pi_{\theta}(a|s)}{\pi_{\theta_k}(a|s)} A^{\pi_{\theta_k}}(s,a) }$$ Is this expectation over all (state, action) pairs in the policy ${\pi_{\theta_k}}$ of $\frac{\pi_{\theta}(a|s)}{\pi_{\theta_k}(a|s)} A^{\pi_{\theta_k}}(s,a)$ equivalent to: $$\mathop{\mathbb{E}}_{s \sim \pi_{\theta_k}}[\mathop{\mathbb{E}}_{a \sim \pi_{\theta_k}}{ \frac{\pi_{\theta}(a|s)}{\pi_{\theta_k}(a|s)} A^{\pi_{\theta_k}}(s,a) }]$$ Would this be equivalent to a double integral, or is there something different between expecting over ${s,a}$ pairs and expecting over them individually?

1

There are 1 best solutions below

2
On BEST ANSWER

Yes, the Law of total expectation says that if you have two random variables $X$ and $Y$, potentially with some dependency between them, then the overall expectation of $X$ can be found by first taking the expectation conditional on $Y$ then taking the expectation over $Y$. In other words,

$$\mathbb{E}[X] = \mathbb{E}_Y \left[ \mathbb{E}_X[X|Y] \right]$$