Interpretation of mutual information

103 Views Asked by At

I remember having read the following interpretation of Shannon's information somewhere

"For discrete variables, I(X;Y) quantifies how well we can discriminate among the outcomes of Y by looking at the outcomes of X (and viceversa)"

This interpretation is fundamental for my paper cause it stresses the "discrimination" aspect. However, I cannot find a reference for it anymore.

So I am wondering whether:

  • whether this is a well-accepted interpretation of MI and
  • whether anyone could point out a reference for it.

Addendum: I think that the property that MI is zero if either one of the variables has only one single outcome is actually a fundamental property of MI. In particular it speaks against some common interpretations of MI, for example:

  • MI quantifies the predictability of $Y$ from $X$ (or viceversa): wouldn't $I(X;Y)$ then be maximal and not minimal when either $Y$ has only one outcome, since it is perfectly predictable?
  • MI quantifies the statistical relationship between $X$ and $Y$. But then I would expect $I(X;Y)$ to be maximal when both $X$ and $Y$ have only one possible outcome, since in this case the statistical relationship is deterministically defined.

I am looking for an interpretation that stresses the fact that for MI to make sense both $X$ and $Y$ must have at least two outcomes each.

1

There are 1 best solutions below

4
On

This is what I have learned while studying information theory, but I do not know a reference for a quote stressing the discrimination aspect.

I think of mutual information as measuring the size of the intersection in the Venn diagram between the information "contained" in two sources (random variables). If that is a complete overlap, then the outcome of $X$ will exactly be the outcome of $Y$. If there is no overlap, then the outcome of $X$ will not say anything about the outcome of $Y$. If there is partial overlap, the outcome of $X$ should sway your belief in some direction about the outcome of $Y$ (helping to discriminate between which outcomes are likely/unlikely) but will not guarantee the outcome of $Y$.