Probability that a sample is generated from a distribution

310 Views Asked by At

Let $f_X(x)$ and $g_{Y}(x)$ be probability mass functions of discrete random variables X and Y. Mike selects a random variable (he chooses $X$ with probability $1/2$ or $Y$ with probability $1/2$), then he generates a sample of it and gives it to us. Let $a$ be the number that we get. We don't know which random variable was selected. Based on the observation $a$, find the probability that he has selected $X$.

Let ${A}$ be the event that $a$ is observed. To answer this question, we have to calculate: \begin{align} {P}(a \text{ is a sample of } X|A)=\frac{P(A \cap \{X\text{ selected}\})}{P(A)}&=\frac{P(A | X\text{ selected})P(X\text{ selected})}{P(A|X \text{ selected })0.5+P(Y \text{ selected })0.5}\\&=\frac{f_X(a) 0.5}{P(A|X \text{ selected })0.5+P(Y \text{ selected })0.5}\\ &=\frac{f_X( a)}{f_X( a)+g_Y( a)}\\ \end{align}

How can we extend this to the continuous random variables?

For each probability density function (pdf) the probability of observing $a$ is zero. So, we cannot use the above math to calculate the probability we need. But, intuitively, we can have examples of $X$s and $Y$s such that their support includes $a$, but one of them is more centered at $a$, so, it is more probable that it is generated from the one centered at $a$. How can we measure how much it is probable that $X$ generated $a$?

1

There are 1 best solutions below

5
On

Consider a small interval around $a$, i.e., the observed value to be in the interval $[a -\varepsilon, a+\varepsilon]$ and then take the limit $\varepsilon \to 0$ when evaluating the ratio.

Then the ratio becomes \begin{equation} \mathrm{lim}_{\varepsilon \to 0} \frac{\int^{a+\varepsilon}_{a-\varepsilon}f_X (x) dx}{\int^{a+\varepsilon}_{a-\varepsilon}f_X (x) dx \, + \int^{a+\varepsilon}_{a-\varepsilon}g_Y (y) dy } = \frac{f_X(a)}{f_X(a)+g_Y(a)} \end{equation}

For an illustration, let us consider $X \sim \mathcal{N}$(0,1) and $Y \sim \mathcal{N}(1,1)$ and take the observed value to be $a$.

\begin{align} \mathrm{lim}_{\varepsilon \to 0} \frac{\int^{a+\varepsilon}_{a-\varepsilon}f_X (x) dx}{\int^{a+\varepsilon}_{a-\varepsilon}f_X (x) dx \, + \int^{a+\varepsilon}_{a-\varepsilon}g_Y (y) dy } &= \mathrm{lim}_{\varepsilon \to 0}\frac{ \int^{a+\varepsilon}_{a-\varepsilon} \mathrm{e}^{-x^2/2} dx}{\int^{a+\varepsilon}_{a-\varepsilon} \mathrm{e}^{-x^2/2} dx \, + \, \int^{a+\varepsilon}_{a-\varepsilon} \mathrm{e}^{-(y-1)^2/2} dy} \end{align}
Using the Liebniz integral rule of differentiating under the integral sign to calculate the derivative for applying L'Hopital's rule, \begin{equation} \mathrm{lim}_{\varepsilon \to 0} \frac{e^{-(a+\varepsilon)^2/2}+e^{-(a-\varepsilon)^2/2} }{e^{-(a+\varepsilon)^2/2}+e^{-(a-\varepsilon)^2/2} + e^{-(a+\varepsilon-1)^2/2}+e^{-(a-\varepsilon-1)^2/2} } = \frac{e^{-(a)^2/2} }{e^{-(a)^2/2} + e^{-(a-1)^2/2} } \end{equation}

So, if the observed value is $0$, the probability that it came from the distribution $X$ $\approx .6225$, which is higher, like we would expect.