Probability that a sample comes from one of two distributions

2.2k Views Asked by At

Let's say I have two normal distributions with means $\mu_1$, $\mu_2$ and standard deviations $\sigma_1$, $\sigma_2$ (which I know). I am handed a random variate from one of the distributions (I don't know which). What is the likelihood that my variate belongs to distribution 1 and not distribution 2?

UPDATE: a concrete example. Machine one generates normally-distributed variates with mean 1053 and standard deviation 59. Machine two generates normally-distributed variates with mean 1187 and standard deviation 73. One of them is picked at random, the handle is turned (unseen by me) and the number 1162.4 comes out. What is the likelihood that number was generated by machine 1 as opposed to machine 2?

1

There are 1 best solutions below

5
On BEST ANSWER

You can use a Bayesian approach and compute the odds ratio. Let $\Theta_1 = (\mu_1, \sigma_1) = (1053, 59)$ and similarly for $\Theta_2$. Right now, we will assume that there is equal probability of coming from either distribution, so $$ \frac{P(\Theta_1)}{P(\Theta_2)} = 1 $$ What we can calculate is $$ \frac{P(\Theta_1|D)}{P(\Theta_2|D)} $$ where $D$ is the new data point. $$ \begin{align} P(\Theta_1|D) &= \frac{P(D|\Theta_1)\;P(\Theta_1)}{P(D)}\\ P(\Theta_2|D) &= \frac{P(D|\Theta_2)\;P(\Theta_2)}{P(D)}\\ \frac{P(\Theta_1|D)}{P(\Theta_2|D)} &= \frac{P(D|\Theta_1)\;P(\Theta_1)}{P(D)}\cdot\frac{P(D)}{P(D|\Theta_2)\;P(\Theta_2)}\\ &= \frac{P(D|\Theta_1)\;P(\Theta_1)}{P(D|\Theta_2)\;P(\Theta_2)} \end{align} $$ Plugging in the numbers we get: $$ \large P(D|\Theta_1) = \frac{1}{59\sqrt{2\pi}}e^{-\frac{\left(1162.4 - 1053\right)^2}{2\cdot 59^2}} \approx 0.00121189\\ \large P(D|\Theta_2) = \frac{1}{73\sqrt{2\pi}}e^{-\frac{\left(1162.4 - 1187\right)^2}{2\cdot 73^2}} \approx 0.005163308\\ \large P(\Theta_1) = P(\Theta_2) = \frac{1}{2}\;\textrm{so they cancel}\\ \large \frac{P(\Theta_1|D)}{P(\Theta_2|D)} \approx \frac{ 0.00121189}{0.005163308} $$ So the odds ratio is now not $1:1$ but closer to $1:4.26$ so it is a bit more than 4 times as likely that the second machine was used.