Can I bound the correlation of two random variables using the mutual information?

879 Views Asked by At

As correlation

$\rho_{X,Y} := \frac{Cov(X,Y)}{\sigma_X \sigma_Y}$

sort of measures the linear dependence of two random variables, and mutual information

$I(X; Y) := H(X) - H(X|Y)$

measures the general dependence of two random variables, I feel like it should be possible to get an upper bound on the correlation in terms of the mutual information.

I expect that as the mutual information increases, correlation tends to 1, and as mutual information tends to 0, correlation also does.

Can anyone help me formalise this?

2

There are 2 best solutions below

3
On

For the mutual information, it can be useful to consider the conditional entropy instead: $$H(X|Y) = H(X,Y) - H(X)$$

However, the claim in the question is incorrect, because correlation indicates only linear dependence, while mutual information relates to dependence in general.

Going back one step to covariance, we can find the following example:

  • $Y = X^2, X$ uniform distributed in $[-1,1]$
  • $\Rightarrow \sigma(X,Y) = 0, \sigma(X)= E(X^2)\neq 0, \sigma(Y)= E(X^4) \neq 0$, thus we get $\rho_{X,Y} = 0$
  • However, as stated in conditional entropy: $H(Y|X) = 0$, because $Y$ is completely determined by the value of $X$.
  • For the mutual information, we get: $I(X;Y) = H(Y) - H(Y|X) = H(Y)$. And this does not tend to $0$, even if the correlation is $0$ already.
0
On

If you have some additional hypothesis, such as that $Z = XY$ is $\sigma$-subgaussian with respect to the law $\mathbb P_X\otimes \mathbb P_Y$, then it is indeed possible and you have the following bound

$$C(X; Y) \leq \sigma\sqrt{2I(X;Y)}\,.$$

This is indeed a particular case of a result of a known information theoretic inequality, see for instance Lemma 1 in "Information-theoretic analysis of generalization capability of learning algorithms" (Xu & Raginski, 2017), where you can use $f(X, Y) = XY$.

$\sigma$-subgaussianity means that $$\log\mathbb E_{\mathbb P_X\otimes\mathbb P_Y}[e^{\lambda (XY-\mathbb E[X]\mathbb E[Y])}]\leq \lambda^2\sigma^2/2$$ for every real $\lambda$.

In particular, if $XY$ is bounded then it is subgaussian. More explicitly, if $XY\in[a, b]$, then $$C(X;Y)\leq (b-a)\sqrt{I(X;Y)/2}\,.$$

This result is related to the one from user18862, as you have that $XY\in [-\|X\|_\infty\|Y\|_\infty, \|X\|_\infty\|Y\|_\infty]$, which brings the same result $$C(X;Y)\leq \|X\|_\infty\|Y\|_\infty\sqrt{2I(X;Y)}\,.$$

However the subgaussianity can hold in more general settings, even for unbounded random variables.