Define a parameterized Gaussian distribution as $\mathcal N(\mu,\sigma)$, where $\mu$ and $\sigma$ are sampled from some other fixed Gaussian distributions, respectively. I empirically find the following two are not equivalent.
- First sample $\mu$ and $\sigma$, then sample $z$ from $\mathcal N(\mu,\sigma)$
- For $i=1, 2,...,10$, first sample $\mu_i$ and $\sigma_i$, then sample $z_{i}$ from $\mathcal N(\mu_i,\sigma_i)$. Finally, we uniformly sample $z$ from $z_1, z_2, ..., z_{10}$.
I have an agent that takes actions based on the above two strategies. I constantly observe that the first strategy yields better performance than the second one, but I fail to see the difference. Why are they different?