Generating points from 2 Normal distributions and $0$-probability continuous r.v.s

53 Views Asked by At

Consider the following experiment:

We generate "green" points and "blue" points in $\mathbf{R}$ using two different normal distributions as follows:

  • 1000 green points are sampled from a $N(-1, 1)$ distribution
  • 1000 blue poiints are sampled from a $N(1,1)$ distribution

Now I hide the colors of the points, point to one of them at random and ask you: What is the probability that this point is green?

I think we want: $$P(g | X=x) = \frac{P(X=x | g)P(g)}{P(X=x)}.$$

Now I think $P(g)=\frac{1}{2}$, since green and blue are equally likely, and $P(X=x) = \frac{1}{2000}$ since I am showing you one of 2000 points with equal probability. What throws me off is the $P(X=x | g)$. This is the probability that $X=x$ when $X$ is distributed as $N(-1,1)$. Isn't this simply $0$ since $X$ is a continuous random variable? Am I way off here?

2

There are 2 best solutions below

0
On

You have the right idea but the wrong Bayes formula.

You need to replace $P(X=x|g)$, which is indeed $0$, by $f_g(x)$ where $f_g$ is the pdf of $N(-1,1)$.

Similarly, $P(X=x)=0$ but you replace it by $f_g(x)P(g)+f_b(x)P(b)$. This is called a gaussian mixture.

2
On

The idea behind is OK but the following expression is wrong:

$$P(g | X=x) = \frac{P(X=x | g)P(g)}{P(X=x)}.$$

Here is the right approach. Say, first, that the a black random point (denoted by $X$) lies in the interval $[x,x+\Delta x]$. Then we have

$$P(g\mid x\le X \le x+\Delta x)=\frac{P(\{x\le X \le x+\Delta x\} \cap g)}{P(x\le X \le x+\Delta x)}=\frac{P(x\le X \le x+\Delta x \mid g)\frac12}{P(x\le X \le x+\Delta x)}.$$

Now, if $\Delta x$ is small then

  1. $P(x\le X \le x+\Delta x \mid g)\approx f_g(x)\Delta x$.
  2. $P(x\le X \le x+\Delta x)=\frac12(P(x\le X \le x+\Delta x\mid g)+P(x\le X \le x+\Delta x\mid b))\approx \frac12(f_g(x)\Delta x+f_b(x)\Delta x) $

With this

$$P(g\mid x\le X \le x+\Delta x)\approx\frac{\frac12f_g(x)\Delta x}{\frac12(f_g(x)\Delta x+f_b(x)\Delta x)}.$$

(For the sake of simplicity, I've assumed that $P(g)=P(b)=\frac12.$)

Apparently $\Delta x$ does not have a real role in this argumentation. It is only a slight technical problem to change $\approx$ to a $\lim\limits_{\Delta x \to 0} $ and an $=$.

So, we have

$$P(g\mid X=x)=\frac{f_g(x)}{f_g(x)+f_b(x)}.$$

Or, in general

$$P(g\mid X=x)=\frac{P(g)f_g(x)}{f_g(x)P(g)+f_b(x)P(b)}.$$