Sampling problem: why being able to sample from $P(s)$ and being able to compute $P(a|s)$ implies being able to sample from $P(a)$

30 Views Asked by At

I call $P(s)$ a probability distribution for some variable $s$. I call $P(a)$ the probability distribution for some variable $a$.

I have:

$$P(a)=\sum_s P(a|s) P(s)$$

Imagine that I have a computer able to provide me sample $s$ following the probability distribution $P(s)$, and able to compute $P(a|s)$. I am pretty sure I have read somewhere that in such a case, it can provide samples $a$ following the probability distribution $P(a)$ (unfortunately I don't find anymore the source for this: this is kind of "common knowledge" I feel like).

However I do not understand why this is true. Indeed, for me it would only mean that for any $s$, the computer can provide samples following the probability distribution $P(a|s) P(a)$.

But this is not sufficient as we don't have the summation $\sum_s$.

Hence, my question is: why being able to sample from $P(s)$ and being able to compute $P(a|s)$ implies being able to sample from $P(a)$

I am interested by a pedagogic answer but also a reference (like a book) explaining it.