Sampling from a Mixture of Distributions

221 Views Asked by At

Sorry, but I give some detail before getting to my question, at the end.

Suppose we have a probability distribution for random variable $X$. Let $X$ have the domain $\mathcal{X}$ and be represented by a mixture of probability distributions:

$p_X(x) = \sum_{j=1}^{N}\alpha_j f_j(x)$

$\alpha_j \geq 0, \forall j$ and $\sum_{j=1}^{N}\alpha_j = 1$.

Also note that $f_j(x)$ is also a probability distribution function so $f_j(x) \geq 0$ everywhere and $\int_{\mathcal{X}}f_j(x)=1$.

Suppose now we wish to randomly sample this distribution. I know one method would be to assume a latent variable, say $u$, which will be drawn from a standard uniform distribution, $u \sim U(0,1)$. We can then use the following form,

$p_{X,U}(x,u) = \{f_k(x)| \sum_{j=1}^{k-1}\alpha_j < u < \sum_{j=1}^{k}\alpha_j, k \in (1,...,N)\}$.

To keep with the above shorthand just note, $\sum_{j=1}^{0}\alpha_j = 0$. See then the marginal pdf satisfies the above,

$p_{X}(x) = \int_{0}^{1}p_{X,U}(x,u)du = \sum_{j=1}^{N}\alpha_jf_j(x)$.

Thus, we know a valid way of sampling this distribution would be to randomly sample a uniform random variable, $u$, and if $\sum_{j=1}^{k-1}\alpha_j < u < \sum_{j=1}^{k}\alpha_j$ we simply choose $f_j(x)$ and draw sample from this. We repeat this until we have $n$ sample points desired.

I want to know if it also an acceptable method to essentially "skip" the uniform random sampling since we know the weights already. Suppose we choose a sample size that $n$ such that $\alpha_jn \in \mathbb{Z}^{+}$, so the sample size is divisible by each weight. Could we then simply draw $\alpha_jn$ samples from each distribution $f_j(x)$ and then take the total sample as the collection of each subsample from $f_j(x)$.

For example, lets say we have a mixture of 3 equally weighted normal distributions,

$p_X(x) = \frac{1}{3}N(x|\mu_1,\sigma_1)+\frac{1}{3}N(x|\mu_2,\sigma_2)+\frac{1}{3}N(x|\mu_3,\sigma_3),$

and I want to draw 300 samples. Can I justify that drawing 100 samples from each $N(x|\mu_k,\sigma_k), k \in (1,2,3)$ and taking a total sample as these 3 samples would follow the $p_X(x)$ as its underlying distribution? Intuitively it makes sense to me that it should, but unlike the latent variable method I am having troubling seeing how this is mathematically provable.