How to use Bayes's rule with mixed distributions?

1.1k Views Asked by At

On page 81 of The Likelihood Principle by Berger and Wolpert (1988) I find the following claim (which references example 20 on page 75).

We consider a certain statistical problem from a Bayesian perspective. Suppose we have an infinite sequence of i.i.d. random variables with distribution $N(\theta,1)$. Consider the stopping rule that stops when $n$ is such that $|\overline{X_n}\ge Kn^{-1/2}|$, where $K$ is some fixed constant and $\overline X_n$ is the sample mean. It can be shown this stopping time is finite with probability 1. Then the likelihood function is proportional to a $N(\overline{X_n}, n^{-1/2})$ density. We also assume we have a prior that places half its mass at $0$, and half its mass on a normal distribution centered at zero with very large variance (so the distribution is mixed). Let us agree the wave our hands and say the normal distribution with very large variance can be approximated with an improper uniform prior over the real line.

Recall that the likelihood principle says the stopping rule is irrelevant when making our Bayes's rule calculations. Treating $n$ as fixed in advance, Berger and Wolpert make the following claim about the posterior:

$$\pi(0\,|\, \overline{X_n}=Kn^{-1/2})=[1+(1+n^2)^{-1/2}\exp[(K^2n)/(2(1+n))]]^{-1}.$$

I cannot figure out how this was derived. In particular, I am clueless how Bayes's rule should be applied in the case of a mixed prior and continuous likelihood. Any help would be greatly appreciated.

1

There are 1 best solutions below

1
On BEST ANSWER

B&W apply Bayes rule in exactly the same way you would with a continuous distribution. They start with a prior

$p(\theta) = \lambda \delta(\theta) + (1-\lambda) \frac{1}{\sqrt{2\pi \rho^2}} e^{-\frac{\theta^2}{2\rho^2}} $.

The conditional probability for observing data ${\bf x}=(x_1, \ldots, x_n)$ given $\theta$ is $p({\bf x} | \theta) = \frac 1 {(2\pi)^{n/2}}e^{-\frac 1 {2} \sum_{i=1}^n (x_i - \theta)^2} $.

(It seems $\rho^2$ is only the variance of the prior - they sample from $N(\theta,1)$.)

The probability without conditioning is $p({\bf x}) = \int_{-\infty}^{\infty} p({\bf x}|\theta)p(\theta) d\theta = \ldots = \frac{e^{-\frac 1 {2} \sum_{i=1}^n x_i^2}}{(2\pi)^{n/2}} \left(\lambda + (1-\lambda) (n\rho^2+1)^{-\frac 1 2} e^{\frac {n^2\rho^2 \bar{x}^2}{2(n\rho^2+1) }}\right) $, where $\bar x$ is the mean that they assume is $\frac K {\sqrt{n}}$.

So we have all we need to use Bayes' rule, $p(\theta|{\bf x}) = \frac{p(\theta) p(\bf x | \theta)}{p(\bf x)}$. The coefficient of the delta function in $p(\theta|{\bf x})$ is the expression they give.