Why the sample method of mixture distribution works?

254 Views Asked by At

For example this thread: Generating random variables from a mixture of Normal distributions

  1. First choose a distribution according to the weights.
  2. Then sample from the chosen distribution.

How to prove the correctness of this method?

1

There are 1 best solutions below

0
On

The method described in 1 and 2 seems a bit vague to me. I have used what I suppose to be something like it, as follows:

Suppose you want a mixture of two normal distributions, say $\mathsf{Norm}(\mu_1 = 40,\, \sigma_1 = 2)$ and $\mathsf{Norm}(\mu_2 = 50,\, \sigma_2 = 3).$ Also, suppose you want a 50:50 split so that each observation has probability $1/2$ of being chosen from the appropriate one of the distributions.

First choose $1$ or $2$ at random, and then sample from the corresponding normal distribution. I will show my method in R statistical software, using a for-loop. This is not the most elegant programming structure, but I find it is easily understood by people not familiar with R--and the easiest to translate into other programming languages.

set.seed(2017);  n = 10^4;  x = numeric(m)
mu=c(40, 50);  sg=c(2,3)
for (i in 1:n) {
   j = sample(1:2, 1)
   x[i] = rnorm(1, mu[j], sg[j]) }
hist(x, prob=T, ylim=c(0,.1), col="skyblue2", ylab="PDF", 
      main="50:50 Mix of NORM(40,2) and NORM(50,3)")
curve(.5*dnorm(x, 40, 2)+.5*dnorm(x,50,3), lwd=2, col="red", add=T)
mean(x);  sd(x)
##  44.97579  # aprx E(X) = 45
##  5.622631  # aprx SD(X)

The mean of the mixture distribution should be $E(X) = 45$ and the density should be a (pointwise) average of the two PDFs: $f_m(x) = .5f_1(x)+.5f_2(x).$ I will leave it to you to verify that $SD(X) \approx 5.6.$ For a formal proof, I suggest you define your mixture distribution conditionally on the choice of a normal distribution ($1$ or $2$ in my example), and use that to get the density function or the CDF of the mixture distribution.

enter image description here

For a mixture that is not 50:50, you can use an additional parameter such as pr = c(1/3, 2/3) in the sample function.

Ref: Wikipedia on "Mixture Distributions."

Addendum on 'vectorization' in R: The method in the link is to make all of the $1$ vs $2$ choices first in an $n$-vector I call d (for distribution), then when rnorm generates an $n$-vector xof mixed normals, the choices d are used for each normal value.

n = 10^4
d = sample(1:2, n, rep=T)
mu = c(40,50); sg = c(2,3)
x = rnorm(n, mu[d], sg[d])
mean(x);  sd(x)
## 45.07935
## 5.597325

By looking at the first six of ten thousand values in d and x, you can see that the $x$-value tends to be bigger when the $d$-value is $2.$

head(cbind(d,x))
     d        x
[1,] 1 40.50769
[2,] 2 48.75422
[3,] 1 41.24800
[4,] 2 44.36633
[5,] 2 47.81572
[6,] 1 39.76703