Why do we say A given B in Bayes theorem?

323 Views Asked by At

I enjoyed this blog on visualizing Bayes theorem but I have a question on the wording for .
$p(A\mid B) = p(A,B)/p(B)$
In words I understand this to be "probability of A given B"
if we are "given B" why do we need to divide by p(B) ? Surely if I am "given" something then its probability is $1$?
Why do we say it this way?

[Update]

I am seeing that p(B) = 1 (because we are "given" it )
thus p(A,B)/p(B) = p(A,B)

I am looking for the right words to articulate that we still need to divide by p(B) and that it is not 1

If we were saying "given we know the probability of B" instead of "given B" I would not be so confused.

5

There are 5 best solutions below

0
On BEST ANSWER

I suspect that you may not get a satisfactory answer to your question since you are bothered by a single word that has several meanings in informal discourse but just one in this mathematical context. That one disagrees with your favorite daily use meaning. But I will try.

The right way to think of "given" here is that "the probability of $A$ given $B$" means "what would the probability of $A$ be if I knew that $B$ had occurred?".

So for example the probability that a coin shows "heads" given that it shows "tails" is $0$. I think that's a reasonable English sentence. It's pretty clearly correct. It also follows from the formula in your question.

Dividing by the probability of $B$ in that formula is the precise way to take into account the hypothesized occurrence of $B$. It tells you what fraction the event ($A$ and $B$) is of the event $B$.

2
On

if we are "given B" why do we need to divide by p(B) ? Surely if I am "given" something then its probability is 1?

As I understood probability, it is a property not of a unique event but of a stable event producing scheme. For instance, when we toss a fair coin then a probability of a head in $1/2$. On the other hand, this scheme is missed in a known joke that a probability to meet a crocodile at the street is $1/2$.

The probability of an event provided by the scheme does not change when the event happens (or not). For instance, when we have a head of the tossed coin then the probability of this event is still $1/2$.

Thus, given an event $B$, we don’t change the probability of $B$, but, calculating a conditional probability $P(A|B)$ we do this with respect to a changed scheme, where only those its instances when $B$ happens are taken into account.

0
On

You can think of this in another way. Think of an Excel sheet with 4 columns as follows:

\begin{array} {|r|r|}\hline Event 1 & P(Event 1) & Event 2 & P(Event 2) \\ \hline A & 0.5 & B & 0.5 \\ \hline F & 0.5 & B & 0.25 \\ \hline A & 0.25 & B & 0.25 \\ \hline F & 0.5 & K & 1 \\ \hline A & 0.25 & J & 1 \\ \hline \end{array}

Now, the question is what is the probability of A, knowing that B has occurred?

Since we know B has already occurred, we can just filter the Event2 column for 'B' and we get this:

The "Given B" table

\begin{array} {|r|r|}\hline Event 1 & P(Event 1) & Event 2 & P(Event 2) \\ \hline A & 0.5 & B & 0.5 \\ \hline F & 0.5 & B & 0.25 \\ \hline A & 0.25 & B & 0.25 \\ \hline\end{array}

Ok, so we see that A and F occur given that B has occured from the above table. So we want to know only about A and we don't care about F. So let's filter on the Event1 column above for 'A' only.

\begin{array} {|r|r|}\hline Event 1 & P(Event 1) & Event 2 & P(Event 2) \\ \hline A & 0.5 & B & 0.5 \\ \hline A & 0.25 & B & 0.25 \\ \hline \end{array}

But really, the table above tells you the Probability of A occurring and B occurring. I think the "given B" part can be thought of using this table. Here you know the probability of A "given B". So the probabilities can be represented as:

total probability of A occurring * total probability of B occurring, which is $$(0.5 + 0.25) \times (0.5 + 0.25)$$

But, the question again is what is the probability of A, knowing that B has occurred?

So we don't really care about B here. And to get the value of A and kick B out of the picture, we divide by the total probability of B

$$\frac{(0.5 + 0.25) \times (0.5 + 0.25)}{(0.5 + 0.25)}$$

$$=0.75$$

0
On

I think I am conflating "given" with GivenWhenThen in programming.

I might be better of reading it as "when". Thus "when" we zoom in to make $B$ the whole universe, "then" we need to adjust $AB$ by the same ratio that we used to blow up B.

1
On

One way it becomes more clear (for me at least), is when you look at the Metropolis-Hastings algorithm https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm

It uses Bayes theorem to generate samples from a probability distribution from which direct sampling is difficult. So it generates a posterior distribution from a variable from which we only can give a prior distribution and we can calculate the likelihood of observing the data given that the random variable has a certain value (so we treat it as a given).

Then a posterior distribution can be found by updating the prior distribution with the likelihood of observing the random variable as that given value and the probability it actually takes on that value. By keep doing random sampling of 'B' and observing 'A' and accepting a new 'B' with a certain probability, we get closer and closer to the posterior distribution.