Suppose there is a random variable $X$ with marginal density $p_X$. However only the conditional densities $\{p_{X\mid\Theta}(\cdot\mid\theta):\theta \in \mathbf{T}\}$ are known directly, where $\Theta$ is a random variable with density $p_\Theta$. (Assume all densities exist with respect to some dominating measure).
The marginalization equation for how to evaluate $p_X(\cdot)$ at point $x$
$$ p_X(x) = \int_{\theta \in \mathbf{T}}p_{X\mid\Theta}(x\mid\theta) p_\Theta(\theta) \, d\theta $$
equally admits an interpretation for how to sample from $p_X$:
- First sample $\{\Theta=\theta\}$ from $p_\Theta$
- Then sample $\{X=x\}$ from $p_{X|\Theta}(\cdot\mid\theta)$
The above procedure, that simulating from marginal of $X$ is equivalent to simulating from the joint $(\Theta,X)$ and discarding random variable $\Theta$ not of interest, appears so intuitively obvious and is routinely used in i.e. Monte Carlo methods.
How can one make this argument formal, in terms of the integral above, measure theory, or otherwise?