Intuitive understanding of Baye's theorem

84 Views Asked by At

Suppose I have lost my keys. There is:

  1. Probability 0.7 i lost it in the main room.
  2. Probability 0.2 i lost it in the bedroom.
  3. Probability 0.1 I lost it in the bathroom.

So I look for the keys in the main room first but don't find it. There is a probability 0.35 I missed the keys while searching the room.

I update the infromation after searching without finding my keys.

0.7*0.35=0.245 is the probability of finding

updated Probabilities:

  1. Probability 0.245 i lost it in the main room.
  2. Probability 0.2 i lost it in the bedroom.
  3. Probability 0.1 I lost it in the bathroom.

New probabilities dividing everything by 0.545.

  1. Probability 0.245 i lost it in the main room.
  2. Probability 0.2 i lost it in the bedroom.
  3. Probability 0.1 I lost it in the bathroom.

But now what I don't get is why you need to divide everything by 0.545 (sum of all probabilities above). I understand that doing it will leave us with everything adding up to 1 and I also understand that we need the 3 probabilities to add up to 1 but i don't understand why we need to do it this way.

I have problems with the fact that if the probability of the keys being in the main room are 0.7 and there is 0.35 chance I missed it, then the probability of the keys still being should be 0.245.

0.245 is a probability already updated with the "new information" so i feel like we only now need to "update" the other two probabilities instead of all of them.

2

There are 2 best solutions below

0
On

You have events $A,B,C,$ and $D$.   The first three being the disjoint and exhaustive events that the key is in the relevant room, and the later being the event for the search failing (our data).

Bayes' Rule states that the "updated probabilities" (aka "posterior probabilities") need to be:

${\bullet\quad \mathsf P(A\mid D)~=~\dfrac{\mathsf P(D\mid A)~\mathsf P(A)}{\mathsf P(D)} ~=~ \dfrac{0.35\cdot 0.7}{\mathsf P(D)} \\ \bullet\quad \mathsf P(B\mid D)~=~\dfrac{\mathsf P(D\mid B)~\mathsf P(B)}{\mathsf P(D)}~=~\dfrac{1\cdot 0.2}{\mathsf P(D)} \\ \bullet\quad \mathsf P(C\mid D)~=~\dfrac{\mathsf P(D\mid C)~\mathsf P(C)}{\mathsf P(D)}~=~\dfrac{1\cdot 0.1}{\mathsf P(D)} }$

Here the numerators are the results you obtaining in the intermediate step.   They are the probabilities that the key is in each room and that the search of the main room fails.

What is $\mathsf P(D)$ I ask, rhetorically.   The Law of Total Probability states:

$\bullet\quad \mathsf P(D)~{=~\mathsf P(D\mid A)~\mathsf P(A)+\mathsf P(D\mid B)~\mathsf P(B)+\mathsf P(D\mid C)~\mathsf P(C)\\=~0.35\cdot 0.7+1\cdot0.2+1\cdot 0.1 }$

Which is sensible since the sum of probabilities for disjoint and exhaustive events must equal $1$.   In this case we have a conditioned sample space, but the criteria must still hold.   Now dividing each numerator by the same denominator will ensure that the results remain proportionately scaled.   This denominator, $\mathsf P(D)$, being what it is ensures that $\mathsf P(A\mid D)+\mathsf P(B\mid D)+\mathsf P(C\mid D)=1$, as is required.

That is all.


PS: $\mathsf P(D\mid B)=1=\mathsf P(D\mid C)$ because a search of the main room will certainly fail when given that the key is in another room.

0
On

You like to have an intuitive grasp on prior / posterior probability: so let's avoid symbolism.

You should think of probability always as a relative measure, a "fraction".

Consider a Venn diagram of your problem.

Bayes_1

The square represents the Universe of events under consideration

Already by restricting the cases to only three, you have shrinked the square to the three circles: the probabilities assigned to them are relative to their union (sum of the probabilities, since they are exclusive).

Upon the event that you didn't find the key in the first room, you are doing an analog operation: shrink the Universe to part of $A$ plus $B$ plus $C$.
Therefore you shall recalibrate the single probabilities as relative to the new universe.

That's not compulsory to be done, and depending on the case of study and the manipulations to be done, you might well keep with the original values (reduced for $A$) . What is important is to explicitely note / understand to which universe the values being declared are relative to and act consequently.