For example this thread: Generating random variables from a mixture of Normal distributions
- First choose a distribution according to the weights.
- Then sample from the chosen distribution.
How to prove the correctness of this method?
For example this thread: Generating random variables from a mixture of Normal distributions
How to prove the correctness of this method?
Copyright © 2021 JogjaFile Inc.
The method described in 1 and 2 seems a bit vague to me. I have used what I suppose to be something like it, as follows:
Suppose you want a mixture of two normal distributions, say $\mathsf{Norm}(\mu_1 = 40,\, \sigma_1 = 2)$ and $\mathsf{Norm}(\mu_2 = 50,\, \sigma_2 = 3).$ Also, suppose you want a 50:50 split so that each observation has probability $1/2$ of being chosen from the appropriate one of the distributions.
First choose $1$ or $2$ at random, and then sample from the corresponding normal distribution. I will show my method in R statistical software, using a for-loop. This is not the most elegant programming structure, but I find it is easily understood by people not familiar with R--and the easiest to translate into other programming languages.
The mean of the mixture distribution should be $E(X) = 45$ and the density should be a (pointwise) average of the two PDFs: $f_m(x) = .5f_1(x)+.5f_2(x).$ I will leave it to you to verify that $SD(X) \approx 5.6.$ For a formal proof, I suggest you define your mixture distribution conditionally on the choice of a normal distribution ($1$ or $2$ in my example), and use that to get the density function or the CDF of the mixture distribution.
For a mixture that is not 50:50, you can use an additional parameter such as
pr = c(1/3, 2/3)in thesamplefunction.Ref: Wikipedia on "Mixture Distributions."
Addendum on 'vectorization' in R: The method in the link is to make all of the $1$ vs $2$ choices first in an $n$-vector I call
d(for distribution), then whenrnormgenerates an $n$-vectorxof mixed normals, the choicesdare used for each normal value.By looking at the first six of ten thousand values in
dandx, you can see that the $x$-value tends to be bigger when the $d$-value is $2.$