Understanding mutual information

176 Views Asked by At

Given random variables $\vec{x}, \vec{y} \in \mathbb{R}^n$, is it true that

$I(\vec{x}: \vec{y}) \geq \sum_i I(x_i, y_i)$

My interpretation is that that collectively several variables should be able to predict another set of variables at least as good as individually. However, when I compute the quantities in this equation from some data I have, I get the opposite, namely $I(\vec{x}: \vec{y}) < \sum_i I(x_i, y_i)$. Can one prove the above inequality? Does my code have a bug, or is it the understanding that is wrong?

Edit: Following @Mini's answer and some reading, I think I can summarize the answer to the interpretation question. There are two competing processes at play:

  • Synergy: Several variables collectively share information with a target that is not shared separately by any of its parts. Synergy increases multivariate MI
  • Redundancy: "Effective dimension" of the problem may be smaller than the actual dimension due to strong dependencies of variables within each vector. Redundancy decreases MI

Thus, my interpretation is correct only in the case when there is no redundancy.

Bonus Question: Is there a redundancy-corrected version of MI, which would only measure presence or absence of synergy?

1

There are 1 best solutions below

2
On BEST ANSWER

The written inequality is not true. A simple example is when $\stackrel{\rightarrow}{X}=(X_1,X_2)$ and $\stackrel{\rightarrow}{Y}=(Y_1,Y_2)$, such that $X_1=X_2$ and $Y_1=Y_2$. In this case $$I(\stackrel{\rightarrow}{X};\stackrel{\rightarrow}{Y})=I(X_1;Y_1)=\frac{1}{2}\sum \limits_i I(X_i;Y_i)\leq \sum \limits_i I(X_i;Y_i).$$ The other direction for inequality is not neither true "in general". In general we have \begin{align*} I(\stackrel{\rightarrow}{X},\stackrel{\rightarrow}{Y})&=\sum \limits_i I(X_i,\stackrel{\rightarrow}{Y}\big| X_1^{i-1})\\ &=\sum \limits_{i,j} I(X_i,Y_j\big| X_1^{i-1},Y_1^{j-1}). \end{align*} If you have the assumption that $(X_i,Y_i)$ are drawn independent of other indices, then you have the equality.

To show that the other part is not correct: Suppose $U$ and $V$ are independent Bernoulli$(\frac{1}{2})$ random variables, and denote XOR operation by $\oplus$. Then $$I(U\oplus V;U)=I(V;U)=0.$$ Moreover $$I(U\oplus V,V;V,U)=I(U,V;V,U)=H(U,V)=2$$ Now let $X_1=U\oplus V$, $X_2=V$, $Y_1=V$, and $Y_2=U$. We have \begin{align*} I(\stackrel{\rightarrow}{X};\stackrel{\rightarrow}{Y})=2 > 0=\sum \limits_i I(X_i;Y_i). \end{align*} Your interpretation considers prediction, which could be close to the concept of conditional entropy. However, here you have mutual information, which is about "what they have in common".