Assume that when $\theta=1$, Y ~ N(1,$\sigma^2$) and when $\theta=2$, Y ~ N(2,$\sigma^2$). Let P($\theta=1$)=P($\theta=2$)=0.5
I'd like to find the posterior distribution. Here's the step I've done.
(1) Find the marginal probability distribution.
$P_{Y}(y) = P(Y|\theta=1)P(\theta=1) + P(Y|\theta=2)P(\theta=2)$
$=\frac{1}{2}\times\frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{1}{2}\left(\frac{y-1}{\sigma}\right)^{2}\right) + \frac{1}{2}\times\frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{1}{2}\left(\frac{y-2}{\sigma}\right)^{2}\right)$ (Simply => $0.5N(1,\sigma^2)+0.5N(2,\sigma^2)$)
It seems like a mixture distribution not a convolution?
(2) Explain what happens to the posterior distribution when $\sigma^2$ increases or decreases
Since $\theta$ can be 1 or 2, I express the posterior like the below.( But I'm not sure I'm doing it right.)
$P(\theta=i|Y) = \frac{P(Y|\theta=i)P(\theta=i)}{p(Y)}=\frac{0.5N(i,\sigma^2)}{0.5N(1,\sigma^2)+0.5N(2,\sigma^2)} = \frac{N(i,\sigma^2)}{N(1,\sigma^2)+N(2,\sigma^2)} = \frac{N(i,\sigma^2)}{N(3,2\sigma^2)}$
and I guess I got stuck at this point. I'm not sure about the result and even if I'm right, I still don't know what happens to the posterior distribution when $\sigma^2$ increases or decreases.
Can anyone help me?
Thank you.
You are correct that the marginal looks like a mixture of normal densities, because it is. This is a property of the fact that the prior for $\theta$ has a discrete probability distribution, thus the marginal distribution will be a discrete mixture whose mixing weights are the probability masses for random variable $\theta$.
However, your posterior is not correct because the discrete mixture of normal distributions is not itself normally distributed, so you cannot assert that your denominator will be normal with mean $3$. Because the prior for $\theta$ has support $\{1, 2\}$, the posterior for $\theta$ given $Y = y$ will also have the same support; i.e., the posterior for $\theta$ is a location-transformed Bernoulli random variable. We have for instance
$$\Pr[\theta = 1 \mid Y = y] = \frac{f_Y(y \mid \theta = 1) \Pr[\theta = 1]}{f_Y(y)} = \frac{1}{1 + e^{(y-3/2)/\sigma^2}},$$ and $$\Pr[\theta = 2 \mid Y = y] = \frac{1}{1 + e^{-(y-3/2)/\sigma^2}}.$$ This tells us that if $y = 3/2$, the posterior probabilities are equal, which makes sense, since we observed a value that is exactly halfway between the prior means $1$ and $2$, so this data does not provide any additional information about which $\theta$ is more likely. If $y > 3/2$, then $e^{-(y-3/2)/\sigma^2}$ becomes smaller, hence $\Pr[\theta = 2 \mid Y = y]$ increases. If $\sigma$ is large, then this tends to make $e^{\pm (y-3/2)/\sigma^2}$ very small, making it less sensitive to changes to $y$ unless $y$ is extremely large. Conversely, if $\sigma$ is very small, then $e^{\pm(y-3/2)/\sigma^2}$ will become more sensitive to changes in $y$. To illustrate, here is a plot of the marginal for $Y$ for the choices $\sigma \in \{0.25, 0.45, 1\}$:
Blue is $0.25$, orange is $0.45$, and green is $1$. You can see that when $\sigma$ is small, we have two distinguishable maxima centered around $1$ and $2$; as $\sigma$ increases, those maxima are lost because the normal densities "overlap" too much.
Now look at the posterior probability $\Pr[\theta = 2 \mid Y = y]$ for the same $\sigma$:
For $\sigma = 0.25$, the transition at $y = 3/2$ is very sharp, because as we saw from the marginal, $Y$ tends to be seen around $1$ or $2$, and less often around $3/2$. So if we observe, say, $Y = 1.25$, the evidence is much more strong that $\theta = 1$ than $\theta = 2$. But in the case $\sigma = 1$, the densities overlap much more, to the point where it is difficult to tell which $\theta$ is more likely.