Prove Bayesian Updating

175 Views Asked by At

I've started learning Bayesian analysis and for a past few days I'm struggling with proving what seems to be obvious to every author of each literature I read. The question is:

How is Bayesian update justified mathematically?

To explain the question, let me describe a classical example. Assume there is a disease which infects around 1% of a total population. Also, there is a test which has 95% recall and 99% precision. Let variable $\theta$ describe if some person is affected: 1 if yes and 0 if no. Then we may specify the next probabilities: $$ P(\theta=1) = 0.01 \mathrm{~ - Prior},$$ $$ P(y=1 |\theta=1)=0.95 \mathrm{~ - Likelyhood}, ~ P(y=0|\theta=0)=0.99, $$ where $y$ is a variable describing a test result of the person: 1 if positive and 0 otherwise. Using the Bayesian rule we may now calculate our measure of uncertainty about whether (a.k.a. probability of) this person is affected or not, given that its first test result is positive. E.g.: $y_1 = 1$. We have: $$P(\theta=1|y_1=1) \mathrm{~ - Posterior } = {{P(\theta=1) P(y_1=1|\theta=1)} \over {P(y_1=1)} \mathrm{~ - Marginal}} = {{P(\theta=1) P(y_1=1|\theta=1)} \over {\int P(\theta) P(y_1=1|\theta)d\theta} } = $$ $$ ={{P(\theta=1) P(y_1=1|\theta=1)} \over {P(\theta=1)P(y_1=1|\theta=1) + P(\theta=0)P(y_1=1|\theta=0)}} = {{0.01 \cdot 0.95} \over {0.01 \cdot 0.95 + 0.99 \cdot 0.01}} \approx 0.49.$$

This makes a lot of sense. If there are 1000 persons tested, we expect only around 10 to be affected. 9 out of those 10 (95%) and 9 out of rest 990 (1%) would yield positive result. So its approximately $9 \over 18$ - affected to all the positive.

Now the magic part.

All the literature I found regarding the above example state, that if the person makes a second test and gets it positive too ($y_2=1$), then there is no need to recalculate the entire thing. Rather we may just use the posterior distribution above as a prior to update our measure. And this is done like this: $$P(\theta=1|y_1=1,y_2=1) = {{P(\theta=1|y_1=1) P(y_2=1|\theta=1)} \over {P(y_2=1)}} = {{P(\theta=1 |y_1=1) P(y_2=1|\theta=1)} \over {\int P(\theta|y_1=1) P(y_2=1|\theta)d\theta} } = $$ $$ ={{P(\theta=1|y_1=1) P(y_2=1|\theta=1)} \over {P(\theta=1|y_1=1)P(y_2=1|\theta=1) + P(\theta=0|y_1=1)P(y_2=1|\theta=0)}} = {{0.49 \cdot 0.95} \over {0.49 \cdot 0.95 + 0.51 \cdot 0.01}} \approx 0.98.$$ The above calculations make a lot of sense too. Since, approximately 9 out of first positive 9s would yield a second positive result, while most likely 0 out of second positive 9s would yield second positive too. So this time this is approximately $9 \over 9$ - affected to both positive. But this calculation is only valid if $$P(\theta|y_1,y_2) = {{P(\theta|y_1) P(y_2|\theta)} \over {\int P(\theta|y_1) P(y_2|\theta) d\theta}}.~~~~~~~\mathrm{(1)} $$

However, I can't prove the above (and not even sure if this is true) nor straightforward, nor relying on observables iid (independent and identically distributed $y$'s) given $\theta$.

TL;DR

Prove (1). If needed, rely on independence or conditional independence (given $\theta$) of $y_1$ and $y_2$.

1

There are 1 best solutions below

1
On BEST ANSWER

An (arguably more general) form of Bayes rule is

$$P(x|y,z) = \frac{P(y|x,z)P(x|z)}{P(y|z)}$$

where every term is conditioned on some other variable $z$. In your problem, we would condition everything on $y_1$, yielding:

$$P(\theta | y_1,y_2) = \frac{P(y_2|y_1,\theta)P(\theta|y_1)}{P(y_2|y_1)}$$

We can use $P(y_2|y_1) = \int P(y_2|y_1,\theta)P(\theta|y_1)d\theta$ to substitute on the bottom to get

$$P(\theta | y_1,y_2) = \frac{P(y_2|y_1,\theta)P(\theta|y_1)}{ \int P(y_2|y_1,\theta)P(\theta|y_1)d\theta}$$

The key argument from here is that $y_2$ is independent of $y_1$, conditional on $\theta$. In this example, $y_1$ and $y_2$ are not independent, because if you have one positive test, that increases your probability that you are affected, which in turn increases your estimate that a second test will also come up positive. But, if you already know whether or not you are affected (i.e., you are conditioning on $\theta$), then we assume the outcome of the second test will be independent. This means that $P(y_2|y_1,\theta)=P(y_2|\theta)$.

Substituting that in the above gives the result:

$$P(\theta | y_1,y_2) = \frac{P(y_2|\theta)P(\theta|y_1)}{ \int P(y_2|\theta)P(\theta|y_1)d\theta}$$

If you haven't seen the more general form of Bayes rule that I started with, I can think of two reasonably good justifications.

The first would be that the rules of probability ought to let us incorporate more information into all of our terms without impacting the overall reasoning. For example, suppose you read a study which shows that the precision is slightly different than 99%. If we let $z$ represent that information (i.e., the event of "a study came out claiming ..."), then the structure of your reasoning shouldn't need to change, only the numerical values.

The second would be that if we take a random variable like $\theta$, and condition on an event like $(y_1=1)$, this gives us another random variable $\theta | (y_1=1)$. We can just "substitute" this new random variable in Bayes' rule (please convince yourself that conditioning on $\theta |y_1=1$ is the same as conditioning on $\theta,y_1=1$).

Finally, we can just derive it from the product rule by first treating $y,z$ as a single variable, then treating $x,z$ as a single variable, then treating them as separate variables: $$\begin{align} P(x|y,z) = & \frac{P(x,y,z)}{P(y,z)}\\ = & \frac{P(x,z)P(y|x,z)}{P(y,z)}\\ =& \frac{P(z)P(x|z)P(y|x,z)}{P(z)P(y|z)}\\ =& \frac{P(x|z)(y|x,z)}{P(y|z)} \end{align}$$