Mutual information between two dependent variables

1.6k Views Asked by At

Let $X$ be a discrete random variable such that \begin{equation} X = \begin{cases} 1 & \text{with p = 1/3} \\ -1 & \text{with p= 1/3} \\ 0 & \text{with p = 1/3} \end{cases} \end{equation} and let $Y = X^2$ whose distribution is \begin{equation} Y = \begin{cases} 1 & \text{with p = 2/3} \\ 0 & \text{with p = 1/3}. \end{cases} \end{equation}

Note that it is clear that $Y$ is completely determined by $X$. However, $X$ can not be completely determined by $Y$.

It follows from $$ P(X=a,Y=b) = P(Y=b|X=a)P(X=a)$$ that we have the following joint distribution of $(X,Y)$, $$P(X=1,Y=1) = P(X=-1,Y=1) = P(X=0,Y=0) = 1/3.$$ Note that $P(Y=b|X=a)$ is either 1 or 0, as $Y$ is completely determined by $X$.

Therefore, we compute the mutual information of $X$ and $Y$ and obtain $$ I(X;Y) = \sum_{(x,y)} p(x,y)\log \left(\frac{p(x,y)}{p(x)p(y)}\right) = \frac{1}{3}\log(3^3/4) > 0 $$ [Editted] Previously, wrong computation.

Based on the wikipedia

"Intuitively, mutual information measures the information that X and Y share: It measures how much knowing one of these variables reduces uncertainty about the other."

Question In this example, knowing $X$ completely reduces uncertainty in $Y$. Then how should I interpret this value $I(X;Y) = \frac{1}{3}\log(3^3/2) > 0$ in order to draw a conclusion that "$Y$ can be completely determined by $X$"?

My thought I think the mutual information of $X, Y=X^2$, doesn't make sense, as knowing $X$ completely determines $Y$, however, knowing $Y$ does not. This is because $I(X;Y) = I(Y;X)$. That is by changing the role of $X$ and $Y$, we obtain the same value even knowing the fact that $X$ completely determines $Y$ but $Y$ doesn't. This doesn't sound right to me.

It would be very appreciated if someone gives some comments or suggestions or any thoughts. Thanks in advance.

4

There are 4 best solutions below

2
On BEST ANSWER

The intuition that "$Y$ can be completely determined by $X$" is verified only by the fact that $H(Y|X)=0$, i.e., "given $X$, there is no uncertainty in $Y$". The fact that $I(X,Y)>0$ is irrelevant. The value of $I(X;Y)$ quantifies the information that can be transfered when the "data" is $X$ and $Y$ is the "observation" or vice versa. In this case, since $Y$ is a well defined function of $X$, it is intuitive that the mutual information will be positive (i.e., observing $Y$ gives information about $X$ and vice versa). Actually, it will always hold $I(X;Y)>0$, unless $Y,X$ are independent random variables, resulting in $I(X;Y)=0$.

1
On

Let's take the case $ x = 1 $ and $ y = 1 $. The fraction inside the log is $ \frac{ \frac{1}{3} }{ \frac{1}{3} \frac{2}{3} } = 1.5 $.
It is the same for $ x = -1 $ and $ y = 1 $. For $ x = 0 $ and $ y = 0 $ we have $ \frac{ \frac{1}{3} }{ \frac{1}{3} \frac{1}{3} } = 3 $.

So the $ \log \left( \cdot \right) $ argument is always bigger than 1 and all is OK.

1
On

I think it is unclear how you quickly come to log(3/4) in the last step; probably you calculated the marginals wrongly. $p(x)=1/3$ for all x where $p(x)$ is not vanishing. $p(Y=1)=2/3$ and $p(Y=0)$ equals 1/3. So you have $$I=1/3(\log(\frac{1/3}{1/3* 1/3})+\log(\frac{1/3}{2/3 *1/3})+\log(\frac{1/3}{2/3 *1/3}))$$ $$=1/3 (\log(3)+ 2*log(3/2))$$ which is positive.

0
On

Y is fully determined from X simply means that H(Y|X)=0 which implies that I(X:Y)=H(Y)-H(Y|X)=H(Y), which is what you have here.

The fact that X is not fully determined by Y means that H(X|Y)>0 which again implies that I(X:Y)< H(X). That also makes sense, since H(X) > H(Y). The uncertainty of X must be bigger, that's why it cannot be fully determined by Y to begin with.

If you draw Venn diagrams it becomes very clear. The circle representing Y would be entirely inside X.