Bayes theorem with infinitesimal evidence

241 Views Asked by At

The Bayes theorem is often stated as

$$\mathrm{posterior} = \frac{\mathrm{likelihood}\times\mathrm{prior}}{\mathrm{evidence}}$$

So what happens to the posterior probability when the evidence for an event is very small/infinitesimal? Mathematically, it becomes large but what does it mean in real terms?

2

There are 2 best solutions below

1
On BEST ANSWER

I suppose the most well known instance of this is when $X=[X_1,X_2]^\top$ and $X\sim N_2(\mu, V)$ where $V\in\mathbb R^{2\times 2}$ is symmetric and positive-definite, and you want the conditional distribution of $X_1$ given the observed value of $X_2$.

In that case and maybe all others I've seen, including cases where the marginal probability of the evidence is not $0$, the simplest way to proceed is to first find the pointwise product of the prior density and the likelihood function and then figure out what constant it needs to get multiplied by to make it integrate to $1$.

For example, suppose the posterior is found to be $$ \text{some constant}\cdot x\mapsto x^\text{some power} e^{-x/\text{some scale parameter}}\,dx \qquad \text{on }(0,\infty)\text{ and $0$ on $(-\infty,0)$}. $$ The "some constant" is the reciprocal of the marginal probability of the evidence, at least in cases where that probability is not $0$. You might find that marginal probability by working out some integral. But that is not necessary, since we know that $$ \int_0^\infty x^{\alpha-1} e^{-x/\beta} \, dx = \beta^\alpha \Gamma(\alpha). $$ Thus the distribution is $$ \frac{1}{\Gamma(\alpha)}\left( \frac x \beta \right)^{\alpha-1} e^{-x/\beta}\, \frac{dx}\beta\quad\text{on $(0,\infty)$ and $0$ elsewhere.} $$ This is often a lot less work than evaluating the integral that gives the marginal probability of the data, which would give us the same result.

In the bivariate normal case cited above, the posterior distribution is $$ \text{some constant}\cdot \exp\left( -\left( \frac{x-\text{something}}{\text{something}} \right)^2 \right) \, dx. $$ Since we know how to evaulate Gaussian integrals over the whole line, we can find the "constant" that appear at the beginning.

PS: OK, now I have time to be a bit more long-winded. Here's the Gamma example above with details. Suppose $$ f_\Lambda(\lambda)\,d\lambda = \frac 1 {\Gamma(\alpha)} \left(\frac \lambda \beta\right)^{\alpha-1} e^{-\lambda/\beta}\,\frac{d\lambda}\beta \text{ for }\lambda>0\text{ and }=0\text{ for }\lambda<0 $$ (the prior distribution of $\Lambda$, a Gamma distribution) and $$ X\mid\Lambda \sim \mathrm{Poisson}(\Lambda) $$ so that $$ \Pr(X=x\mid\Lambda) = \frac{\Lambda^x e^{-\Lambda}}{x!}\text{ for }x\in\{0,1,2,3,\ldots\}. $$ Then we want the conditional distribution of $\Lambda$ given $X=x$. We have the prior $$ f_\Lambda(\lambda)\propto\lambda^{\alpha-1}e^{-\lambda/\beta} $$ and the likelihood $$ L(\lambda) \propto \lambda^x e^{-\lambda}. $$ You see that I'm not even bothering with normalizing constants. The posterior distribution is therefore $$ f_{\Lambda\mid X=x}(\lambda)\,d\lambda \propto \lambda^{x+\alpha-1} e^{-\lambda/(\beta/(1+\beta))}\,d\lambda\text{ for }\lambda>0. $$ Thus the posterior is a Gamma distribution with shape $x+\alpha$ where the prior had $\alpha$, and scale $\beta/(1+\beta)$ where the prior had $\beta$.

Now our knowledge of the Gamma distribution tells us what the normalizing constant is, so that we get $$ \frac 1 {\Gamma(x+\alpha)} \left(\frac{\lambda}{\beta/(1+\beta)}\right)^{x+\alpha-1} e^{-\lambda/(\beta/(1+\beta))} \, \frac{d\lambda}{\beta/(1+\beta)}\text{ for }\lambda>0. $$

We could have found the same result by using the marginal probability $\Pr(X=x)$ (this time not conditioned on $\Lambda$, but instead averaged over the possible values of $\Lambda$ weighted by their probabilities). I'll be back in a bit and we'll see which way of doing it is more onerous.

PPS: Now let's find that marginal probability: \begin{align} & \Pr(X=x) = \mathbb E(\Pr(X=x\mid\Lambda)) = \mathbb E\left( \frac{\Lambda^x e^{-\Lambda}}{x!} \right) \\[8pt] = {} & \int_0^\infty \frac{\lambda^x e^{-\lambda}}{x!} \frac1{\Gamma(\alpha)}\left( \frac{\lambda}{\beta} \right)^{\alpha-1} e^{-\lambda/\beta} \, \frac{d\lambda}\beta = \frac{1}{x!\Gamma(\alpha)\beta^\alpha} \int_0^\infty \lambda^{x+\alpha-1} e^{-\lambda/(\beta/(1+\beta))} \, d\lambda \\[8pt] = {} & \frac{1}{x!\Gamma(\alpha)\beta^\alpha} \cdot \left(\frac\beta{1+\beta}\right)^{x+\alpha} \int_0^\infty \left(\frac{\lambda}{\beta/(1+\beta)}\right)^{x+\alpha-1} e^{-\lambda/(\beta/(1+\beta))} \, \frac{d\lambda}{\beta/(1+\beta)} \\[8pt] = {} & \frac{1}{x!\Gamma(\alpha)\beta^\alpha} \cdot \left(\frac\beta{1+\beta}\right)^{x+\alpha} \int_0^\infty \mu^{x+\alpha-1} e^{-\mu}\,d\mu \\[8pt] = {} & \frac{1}{x!\Gamma(\alpha)\beta^\alpha} \cdot \left(\frac\beta{1+\beta}\right)^{x+\alpha} \cdot \Gamma(x+\alpha) \\[8pt] = {} & \frac{\beta^x \alpha(1+\alpha)(2+\alpha)\cdots(x-1+\alpha)}{x!(1+\beta)^{x+\alpha}} = \binom{x+\alpha-1}{x} \frac{\beta^x}{(1+\beta)^{x+\alpha}}. \end{align} There is the probability of the evidence, which is the denominator of the fraction.

(So the marginal distribution of $X$ is a negative binomial distribution.)

4
On

If its a true posterior probability, then it should always be bounded between $0$ and $1$.

Also, your formula is not really a formula, but a conceptual framework. The actual formula is instructive:

$$P(B\mid A)=\frac{P(B)P(A\mid B)}{P(A)}=\frac{P(A\cap B)}{P(A)}$$

From the axioms of probability, we know that $\lim\limits_{P(B)\rightarrow 0} P(A\cap B) = 0$ and $$\lim_{P(B)\rightarrow 1} P(A\cap B) = P(A)\implies 0\leq \lim\limits_{P(B)\rightarrow 0} \frac{P(A\cap B)}{P(B)}\leq 1$$

So this should always be between 0 and 1. This is not true if you are looking at the posterior likelihood, where you use teh densities in place of the probability measures.