Let's say some population has a normally-distributed IQ of mean $100$ and standard deviation $15$. Among this population, a subpopulation has a normally-distributed IQ of mean $115$ with standard deviation $15$. This subpopulation accounts for $2$% of the original size of the population.
If I pick an individual from the original population uniformly at random, and his IQ is over $x$, what's the probability that this person belongs to the subpopulation in question?
My reasoning:
Let "sub" be the event "the individual belongs to the subpopulation" and let "$\text{IQ} \geq x$" be "his IQ is over $x$"
$P(\text{sub}|\text{IQ} \geq x) = \frac{P(\text{IQ}\geq x| \text{sub})}{P(\text{IQ} \geq x)}\times P(\text{sub})=\frac{1-erf(\frac{x-115}{15\sqrt{2}})}{1-erf(\frac{x-100}{15\sqrt{2}})}P(\text{sub})=\frac{1-erf(\frac{x-115}{15\sqrt{2}})}{1-erf(\frac{x-100}{15\sqrt{2}})}\times0.02$
This is straight from Bayes' formula, along with the formula for the CDF of a normal random variable. The problem is that this goes to infinity when $x \rightarrow +\infty$, which shouldn't happen for a probability. I don't know why this happens. The formula makes sense even intuitively, I mean,
$\frac{P(\text{IQ}\geq x| \text{sub})\times P(\text{sub})}{P(\text{IQ} \geq x)}$
can be rewritten as
$\frac{P(\text{IQ}\geq x| \text{sub})\times P(\text{sub})\times N}{P(\text{IQ} \geq x)\times N}$
where $N$ is the population size, and this is just
$\frac{\text{Number of people in the subpopulation whose IQ is greater than x}}{\text{Number of people in the total population whose IQ is greater than x}}$
This makes sense to me but apparently this is wrong
Can someone pinpoint the error? I personally can't see it. Perhaps it has to do with my hypothesis which I completely made up? Something I implicitly assumed here is that it is always possible to conceive a population which exhibits a trait following a normally-distributed intensity and such that, for a subset of individuals, this trait intensity follows a normal distribution but with different parameters. If it helps, I am assuming a large population. Perhaps the problem lies here? Perhaps I should partition the population, like $98$% of the population has mean $100$ and $2$% has mean $115$. It is approximately the same thing
Consider the density functions you have chosen for your main population and your subpopulation: \begin{align} f_1(x) &= \frac{e^{-(x - 100)^2/(2 \cdot 15^2)}}{15\sqrt{2 \pi}},\\ f_2(x) &= \frac{e^{-(x - 115)^2/(2 \cdot 15^2)}}{15\sqrt{2 \pi}}. \end{align}
Now consider $$ \frac{f_2(x)}{f_1(x)} = e^{((x-100)^2 - (x-115)^2)/450} = e^{(x/15) - (43/6)}. $$
No matter how small a percentage of the total population your subpopulation is, for large enough $x$ the expected size of your subpopulation near $x$ will be larger than the expected size of the total population near $x.$ In other words, you have set up an impossible distribution.
Your idea at the end, replacing the original population with a $0.98,0.02$ blend of two normal distributions, would fix this. The density of the subpopulation would then never be greater than the density of the whole population.