I call $P(s)$ a probability distribution for some variable $s$. I call $P(a)$ the probability distribution for some variable $a$.
I have:
$$P(a)=\sum_s P(a|s) P(s)$$
Imagine that I have a computer able to provide me sample $s$ following the probability distribution $P(s)$, and able to compute $P(a|s)$. I am pretty sure I have read somewhere that in such a case, it can provide samples $a$ following the probability distribution $P(a)$ (unfortunately I don't find anymore the source for this: this is kind of "common knowledge" I feel like).
However I do not understand why this is true. Indeed, for me it would only mean that for any $s$, the computer can provide samples following the probability distribution $P(a|s) P(a)$.
But this is not sufficient as we don't have the summation $\sum_s$.
Hence, my question is: why being able to sample from $P(s)$ and being able to compute $P(a|s)$ implies being able to sample from $P(a)$
I am interested by a pedagogic answer but also a reference (like a book) explaining it.