Normalized Mutual Information results in log(0) with non-overlapping clusters - how to deal with that?

1.2k Views Asked by At

I want to evaluate how well my flat soft clustering method works, compared to a gold standard. After some research I found that Normalized Mutual Information would most likely be a good measure, for which I found a definition (equation 16.2).

However, in this formula it says the following:

$$\log \dfrac{P(w_k \cap c_j)}{P(w_k)P(c_j)}$$

However, in my clusterings, some clusters have no items in common with some classes, meaning that $P(w_k \cap c_j)=0$. This results in, eventually, $\log(0)$, which is not possible.

How to deal with this? Or am I doing something wrong (quite likely, this is not my area of expertise)? Am I asking the right question anyway?

1

There are 1 best solutions below

2
On BEST ANSWER

In information theory, $0 \log (0) = 0$ by convention. Your formula is just a part of a bigger formula which has a factor of 0 in front.

Below is from the standard text [Cover & Thomas 1991].

enter image description here