Normal distribution governed by a Bernoulli distribution

560 Views Asked by At

How would I find the distributional characteristics (mean, variance) of the following scenario:

A Bernoulli random variable $X \sim B(1,p)$.

If the $X = 1$, then $Y \sim N(\mu_1, \sigma_1^2)$.

If the $X = 0$, then $Y \sim N(\mu_0, \sigma_0^2)$.

One random variable is conditional on another. I know the mean of this scenario is $p \mu_1 +(1-p) \mu_0$, but what is the variance?

Thank you so much.

edit -- based on further research, this is what I have come up with:

$Y | X=1 \sim N(\mu_1, \sigma_1^2)$

$Y | X=0 \sim N(\mu_0, \sigma_0^2)$

$E(Y) = E(E(Y|X)) = p \times E(Y|X=1) + (1-p) \times E(Y|X=0) = p \mu_1 + (1-p) \mu_0$

And,

$Var(Y) = E(V(Y|X)) +V(E(Y|X))$

$E(V(Y|X)) = p \sigma_1^2 + (1-p) \sigma_0^2$

$V(E(Y|X)) = E(E(Y|X)^2) - E(E(Y|X))^2 =E(E(Y|X)^2) - E(Y)^2$

$ = p \mu_1^2 + (1-p) \mu_0^2 - (p \mu_1 + (1-p) \mu_0)^2$

$ = p(1-p) \mu_1^2 + p(1-p) \mu_0^2 - 2p(1-p) \mu_1 \mu_0 $

Hopefully this is correct?

1

There are 1 best solutions below

1
On BEST ANSWER

I was fooled by randomness after running a quick computer simulation and getting an extremely close result to what I posted as a comment, i.e. $\require{enclose}\enclose{horizontalstrike}{{\color{red}{\mathrm{Var}(Y)=\sigma_1^2\times p + \sigma_0^2 \times (1-p)}}}.$ @Just_to_Answer pointed out the fact that this was incorrect since the problem was asking for the variance of a mixture distribution - a mistake I confirmed by simply changing the computer simulation to different parameters.

There is nothing I can add to the post on the topic by @whuber here, and you can credit him appropriately. So you can take this as an extended comment.

The variance does indeed contain the formula above, plus a factor that accounts for the dispersion of the means:

$$\mathrm{Var}(Y) =\color{red}{ \sigma_1^2 \times p + \sigma_0^2\times (1-p)}\color{black}{+\Big[ \mu_1^2\times p +\mu_0^2\times (1-p) - \big(\mu_1 \times p + \mu_o \times(1-p) \big)^2\Big]}$$

And since making the same mistake twice is so human, I ran a simulation again (this time with different settings) to "confirm" the correct equation:

> n = 1e6                 # Number of simulations
> p = .7                  # Probability of the Bernoulli experiment
> Bern = rbinom(n, 1, p)  # The actual simulation
> 
> mean_zero = 400         # If Heads, we draw from a N(400, 33):
> sd_zero = 33
> 
> mean_one = 14           # if Tails, we draw from a N(14,1):
> sd_one = 1
> 
> Y_zero = rnorm(sum(Bern==0),mean_zero,sd_zero)
> Y_one  = rnorm(sum(Bern==1),mean_one,sd_one)
> Y = c(Y_zero, Y_one)   # And combine the results into a single vector.
> 
> var(Y)                 # Empirical variance  
[1] 31639.79
> 
> var(Y_one) * p + (1 - p) * var(Y_zero) +
+  (p * mean_one^2 + (1 - p) * mean_zero^2 - (p * mean_one + (1 - p) * mean_zero)^2)       
[1] 31614.8             # Calculated variance.