I want to evaluate how well my flat soft clustering method works, compared to a gold standard. After some research I found that Normalized Mutual Information would most likely be a good measure, for which I found a definition (equation 16.2).
However, in this formula it says the following:
$$\log \dfrac{P(w_k \cap c_j)}{P(w_k)P(c_j)}$$
However, in my clusterings, some clusters have no items in common with some classes, meaning that $P(w_k \cap c_j)=0$. This results in, eventually, $\log(0)$, which is not possible.
How to deal with this? Or am I doing something wrong (quite likely, this is not my area of expertise)? Am I asking the right question anyway?
In information theory, $0 \log (0) = 0$ by convention. Your formula is just a part of a bigger formula which has a factor of 0 in front.
Below is from the standard text [Cover & Thomas 1991].