Order of variables in Bayes network total probabily expansion?

100 Views Asked by At

I've asked this question on Udacity, no response.

It goes like this. There's a network with A at the root, and three X variables dependent on A. The probability of each given A is equal, and they are conditionally independent. The problem is to get $P(X3|X1)$, given $P(A), P(Xi|A)$, and $P(Xi|!A)$.

I understand why the solution uses the total probability, but what I don't get is why it uses $P(A|Xi)$ instead of $P(Xi|A)$. It feels like a leap, because I haven't learned whatever logic is behind that.

Any links to helpful (comprehensive and easy to read) resources is super appreciated.

1

There are 1 best solutions below

0
On BEST ANSWER

There are a variety of sources on Bayes' theorem, which I will not delve into - a quick internet search will result in the Wikipedia page, which is useful.

However, I will try to explain Bayes' theorem in a nutshell, as you had not encountered it before according to your posting on Udacity, but are more familiar with it after watching a video.

For two events $A$ and $B$, Bayes' Theorem can be summarised by the following equation:-

$\large P(A|B)=\frac{P(A,B)}{P(B)}$ (Eq. $1$)

The term $P(A|B)$ is the conditional probability that event $A$ will occur given that event $B$ has occurred, and is equal to the probability that both event $A$ and $B$ occur ($P(A,B)$) divided by the probability that event $B$ occurs, $P(B)$.

A concrete example would be $A$ being the event of a serious injury in a car accident, and $B$ being the event that a seat belt was worn - one would expect $P(A|B)$ to be much lower than $P(A|B^c)$, which is the conditional probability of injury in a car accident given that a seat belt was not worn.

We can rearrange (Eq. $1$), so that

$P(A,B)=P(A|B)P(B)$ (Eq. $2$)

Note that if event $A$ and $B$ are independent, $P(A,B)=P(A)P(B)$, resulting in $P(A|B)=P(A)$. This means that the probability of $A$ occurring given $B$ has occurred is simply the probability of $A$ occurring - as $B$ is independent of $A$, it will have no bearing on the probability of $A$.

Equation $2$ can be regarded as the simplest example of the Chain Rule

Examining the denominator of (Eq. $1$),

$P(B)=P(B,A)+P(B,A^c)=P(B|A)P(A)+P(B|A^c)P(A^c)$

where $A^c$ is the complement to event $A$, so that $P(A\cup A^c)=1$.

We can thus rewrite (Eq. $1$) as follows:-

$\large P(A|B)=\frac{P(A,B)}{P(B|A)P(A)+P(B|A^c)P(A^c)}$ (Eq. $3$)

As $P(A,B)=P(B,A)$ we have

$ P(A|B)P(B)=P(B|A)P(A)$

So we can express $P(B|A)$ in terms of $P(A|B)$ as follows

$\large P(B|A)=\frac{P(A|B)P(B)}{P(A)}$

You can extend the conditional probability for conditioning on multiple events, as follows:-

$\large P(A|B,C)=\frac{P(A,B,C)}{P(B,C)}$

Based on conditioning for multiple events, we can extend the Chain Rule in (Eq. $2$), for multiple events $A,B,C,D$ as follows

$P(A,B,C,D)=P(A|B,C,D)P(B|C,D)P(C|D)P(D)$

As $P(A,B,C,D)=P(D,C,B,A)$, an equally valid expression of the Chain Rule would be

$P(A,B,C,D)=P(D|C,B,A)P(C|B,A)P(B|A)P(A)$

Now, let us go back to the question you posed on Udacity (and I will adhere to your notation for the sake of consistency). Using Bayes Rule, we have

$\large P(X3|X1)=\frac{P(X3,X1)}{P(X1)}=\frac{P(X3,X1,A)+P(X3,X1,!A)}{P(X1)}$ (Eq. $4$)

The denominator is $P(X1)$, as we have conditioned on this event for the conditional probability. Thus we will expect a probability conditioned on the $X1$ term, i.e. $P(\cdot|X1)$, rather than $P(\cdot|A)$, which is conditioned on $A$.

Using the Chain rule for the first term in the numerator of (Eq. $4$), we have

$P(X3,X1,A)=P(X3|A,X1)P(A|X1)P(X1)$

and for the second term in the numerator, we have

$P(X3,X1,!A)=P(X3|!A,X1)P(!A|X1)P(X1)$

This will result in (Eq. $4$) simplifying to the following (note that the $P(X1)$ terms cancel out)

$P(X3|X1)=P(X3|A,X1)P(A|X1) + P(X3|!A,X1)P(!A|X1)$