Bayesian statistics - explanation of evidence

40 Views Asked by At

Despite trying to read multiple resources about Bayesian statistics, I cannot find a (free) resource which explains what is exactly $P(D)$. Most of the resources explain it somehow conceptually instead of numerically. Some call it "evidence", some call it "normalizing factor" and some call it "marginal distribution". All of them fail to provide exact numeric value of this expression when giving examples.

Therefore, I would like to make my own numerical example.

Let's say that we are tossing a coin which is not fair, as $\theta = 0.7$. However, we believe that the coin is fair, thus our $P(\theta) = 0.5$. After flipping the coin 1000 times, we obtained 720 heads. Thus, $P(D|\theta) = 0.72$.

The formula is given by:

$$P(\theta|D) = \frac{P(D|\theta)P(\theta)}{P(D)}$$

Thus we have:

$$P(\theta|D) = \frac{0.72*0.5}{P(D)}$$

Given these values, I am unsure what is $P(D)$ and what goes into the denominator of the fraction. I would appreciate the explanation, thank you.

1

There are 1 best solutions below

0
On BEST ANSWER

To summarize the discussion in the comments: The source is very vague and informal, though not actually inaccurate. In particular, they abuse notation in an unhelpful manner...using $P(\theta)$ to denote the probability that $\theta$ is some "particular value", which value is then supressed in the notation. Indeed, to use Bayes in the traditional manner, one must already have a prior distribution in mind. Bayes then lets you use the observed data to improve your distribution.

To illustrate, I'll work two examples based on the the OP's scenario. In both cases I'll assume that we know, a priori, that $\theta$ is one of $\{.65, .7, .75\}$ but I'll sketch the analysis of two different distributions on that set.

Example I (uniform):. Each of the values has the same priority.

Of course, given $\theta=\theta_0$, the probability of observing exactly $720$ heads out of $1000$ tosses is $$P(D\,|\,\theta_0)=\binom {1000}{720}\theta_0^{720}\times (1-\theta_0)^{280}$$ Thus the total probability of observing that result is given by the sum $$P(D)=\frac 13\times \left(\binom {1000}{720}.65^{720}\times (.35)^{280}+\binom {1000}{720}.7^{720}\times .3^{280}+\binom {1000}{720}.75^{720}\times (.25)^{280}\right)$$

To get the revised probability that $\theta = \theta_0$ we use Bayes. In this case we get $$P(\theta = .65)=.0000298\quad P(\theta = .7)=.798\quad P(\theta = .75)=.202$$

Thus (qualitatively) you can now pretty confidently reject the possibility that $\theta = .65$ though there is still a solid chance that $\theta = .75$

Example II (nearly certain that $\theta= .7$). Let's say the distribution is now $(.05, .9, .05)$ instead of uniform. The computation is exactly the same only now, instead of a constant factor of $\frac 13$ everywhere, we have weights. Thus $$P(D)=.05\times \binom {1000}{720}.65^{720}\times (.35)^{280}+.9\times \binom {1000}{720}.7^{720}\times .3^{280}+.05\times \binom {1000}{720}.75^{720}\times (.25)^{280}$$

Applying Bayes, we now get the revised probabilities to be $$P(\theta = .65)=.00000298\quad P(\theta = .7)=.986\quad P(\theta = .75)=.014$$

Thus, in this case, the data has simply confirmed your prior strong belief that $\theta = .7$