Mutual information in Bayesian Network

145 Views Asked by At

I have problems interpreting the formula of Mutual Information (MI) as shown below. It was taken from this paper http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.56.8930&rep=rep1&type=pdf enter image description here

In section 3.1, with the formula (3), I calculate MI of I(T,A), the case of single parent.

$\sum\ P_{Pr}(A)$ = 0.99 + 0.01 = 1

$\sum\ p(T|A)\ log \frac{p(T|A)}{P_{pr}(T)} $ = 0.05(log2(0.05/0.03)) + 0.95(log2(0.95/0.03)) + 0.01(log2(0.01/0.97)) + 0.99(log2(0.99/0.97)) = 4.7356

With $P_{pr}(T)$=(0.03 0.97), The correct answer of I(T,A) is 0.009, which a lot different of my calculation. So, I'm asking if anybody can provide me with a detail calculation?

My next step is to also do the same with the formula (4) in section 3.2, where I can calculate I(O,T). This is the case of multiple parents. According to the paper, the correct answer is 0.602, but no matter I calculated it, I couldn't get the answer.

Thank you very much and I highly appreciate your generous help.

2

There are 2 best solutions below

2
On

For the first part though, I'm getting 0.01 which is close but not exact. Here are the details:

If $T$ takes on values $t_1$ and $t_2$ (and $A$, $a_1$ and $a_2$)

$$Pr(T=t_1) = Pr(T=t_1,A=a_1)+Pr(T=t_1,A=a_2) = Pr(T=t_1|A=a_1)Pr(A=a_1)+Pr(T=t_2|A=a_2)Pr(A=a_2)$$

Also, $\sum_{i}Pr(T=t_i|A=a_j) = 1$

Using the corresp. conditional probability matrix (i.e., $$Pr(T=t_1|A=a_1)=0.05, Pr(T=t_1|A=a_2) = 0.01, Pr(T=t_2|A=a_1) = 0.95, Pr(T=t_2|A=a_2) = 0.99$$), we get $Pr(A=a_1) = Pr(A=a_2)$ = 0.5.

Then using (3), we get 0.01.

0
On

Notice that the inner sum in $(3)$ depends on $i$! You can't just separately sum up the $P(A = a)$s and the second sum. This wouldn't even make sense, since the first sum is just always going to be $1$ (sum of probabilities).

Seeing where this formula came from should help. I'll use the shorthands $p(x,y) = p(X = x, Y = y), p(y|x) = p(Y = y| X = x), p(x) = p(X = x)$ etc.

$$ I(X;Y) = \sum_{x,y} p(x,y) \log \frac{p(x,y)}{p(x) p(y)} \\ = \sum_{x,y} p(x) p(y|x) \log \frac{p(y|x)}{p(y)} \\ = \sum_x p(x) \left( \sum_y p(y|x) \log \frac{p(y|x)}{p(y)} \right),$$ where I've used Bayes' law to write $p(x,y) = p(x) p(y|x)$, and separated the double sum into an inner sum over only the $y$s, and an outer sum over the $x$s. Note again that this inner sum gives an $x$-dependent answer (call it $D_x$), and so the quantity is $ \sum_x p(x) D_x,$ and not $\left( \sum_x p(x)\right) \cdot \left( \sum_x D_x \right)$.