Preserving mutual information after compressing states

Question

Preserving mutual information after compressing states

294 Views Asked by Bumbble Comm At 06 Apr 2026 - 9:10

Let $X$ and $Y$ be stochastic variables on respectively $n$ and $m$ points with $m>n$ and a joint probability distribution $p(x,y)$. The mutual information is $$ I(X ;Y) = H(X) + H(Y) - H(X,Y) $$ where $H(X)$ denotes the Shannon entropy of the marginal of $p$ over $X$ and $H(X,Y)$ is the Shannon entropy of the joint distribution $p$.

Is it possible to compress $Y$ to the size of $X$ whilst preserving mutual information? That is, does there exist a stochastic matrix $T: \mathbb{R}^m \rightarrow \mathbb{R}^n$ which sends $p$ to $(I_n\otimes T)p$, such that $$ I(X;Y) = I(X;Y^\prime) $$

Intuitively this makes sense as the maximal amount of information that they can share should depend on the smallest dimension of the two. I however couldn't find any result like this.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2017-04-20 17:42:23

Here's my attempt at answering this interesting question.

The random variables $X,Y,T(Y)$ form a Markov chain $X\rightarrow Y \rightarrow T(Y)$. By the data processing inequality, it always holds $$ I(X;T(Y))\leq I(X;Y) $$ with equality if and only if it also holds $X\rightarrow T(Y) \rightarrow Y$.

The latter condition is what defines the so called sufficient statistic in estimation theory [Cover&Thomas, Ch. 2]. Therefore, your question may be equivalently posed as follows: Is it always possible to find a sufficient statistic $T(Y)$ of dimension smaller than the dimension of the "parameter" $X$?

It turns out that this is not always possible. Consider the following example (taken from these slides). $X\in \mathbb{R}$ is an one-dimensional random variable (of some arbitrary distribution) and $Y\in \mathbb{R}^n$ is an $m$-dimensional random variable whose elements are i.i.d. uniformly distributed over the interval $[X,X+1]$. It can be shown that the so-called minimal sufficient statistic in this case is the two-dimensional vector $(\min\{Y_i\},\max\{Y_i\})$. Therefore, although "compression" of the observation is possible, the dimension of the minimal sufficient statistic is greater than that of $X$. Since the transform $T$ is non-linear in this case, it follows that restricting our attention to linear transforms can only result in an increase of the sufficient statistics dimension.

Preserving mutual information after compressing states

There are 1 best solutions below

Related Questions in INFORMATION-THEORY

Trending Questions

Popular # Hahtags

Popular Questions