Why is the mutual information nonzero for two independent variables

3.5k Views Asked by At

Suppose we have two independent variables X and Y. Intuitively the mutual information, I(X,Y), between the two should be zero, as knowing one tells us nothing about the other.

The math behind this also checks out from the definition of the mutual information (https://en.wikipedia.org/wiki/Mutual_information).

Now let us compute it actually. First generate two simple random vectors of length 10 in R:

X=sample(seq(1,100),10)
Y=sample(seq(1000,10000),10)

I got these:

X={3, 35, 93, 13, 90, 89, 34, 97, 49, 82}
Y={7611, 5041, 2612, 4273, 6714, 4391, 1000, 6657, 8736, 2443}

The mutual information can be expressed with the entropies H(X), H(Y) and the joint entropy between X and Y, H(X,Y)

I(X,Y) = H(X) + H(Y) - H(X,Y)

Moreover

H(X) = -10*[(1/10)*log(1/10)] = log(10)

since each observation occurs only once and thus has a frequency of 1/10 of occurring. The maximum entropy for a random variable of length N is log(N) so this calculation checks out.

Similarly

H(Y) = log(10)

The joint entropy is similar to the individual entropies but this time we count the frequencies of pairs occurring. For example the pair {X=3,Y=7611} occurs only once out of a total of 10 paired observations, hence it has a frequency of 1/10. Therefore:

H(X,Y) = -10*[(1/10)*log(1/10)] = log(10)

since each paired observation occurs only once.

So

I(X,Y) = log(10) + log(10) - log(10) = log(10)

which is clearly non-zero. This is also the result that various R packages (e.g. infotheo) produce.

The question is where is the mistake in my thinking? Why is I(X,Y) not zero?

2

There are 2 best solutions below

0
On BEST ANSWER

Notice how in the formula of Mutual Information there are probabilities, not frequences. A frequency is just an approximation of probability, and with a sample so small, you get very inaccurate approximations, hence the result.

In order to calculate the Mutual Information of a discrete random variable X uniformly distributed over [1,100] and an independent random variable Y uniformly distributed over [1000, 10000], you calculate:

H(X) = -100*[(1/100)*log(1/100)] = log(100)

H(Y) = -9001*[(1/9001)*log(1/9001)] = log(9001)

H(X,Y) = -(900100)*[(1/900100)*log(1/900100)] = log(900100)

I(X,Y) = log(100) + log(9001) - log(900100) = 0

What you have actually calculated is the Mutual Information of two discrete random variables with the following joint probability distribution:

p(3, 7611) = 0.1

p(35, 5041) = 0.1

p(93, 2612) = 0.1

p(13, 4273) = 0.1

p(90, 6714) = 0.1

p(89, 4391) = 0.1

p(34, 1000) = 0.1

p(97, 6657) = 0.1

p(49, 8736) = 0.1

p(82, 2443) = 0.1

These variables are not independent; in fact, knowing one of the values is sufficient to find the other one. That is why their Mutual Information is not zero.

2
On

I believe you were on the correct path but you did a small mistake while calculating the joint entropy. There will be 100 unique pairs of symbols so the joint entropy will be $\log 100$, that will make the mutual information equal to zero.