Invariance of mutual information under smooth injective mapping

88 Views Asked by At

Let $X\colon \Omega\to \mathbb R^m$ and $Y\colon \mathbb \Omega \to\mathbb R^n$ be random variables.

The mutual information is defined as $$I(X; Y) = \int_{\mathbb R^m\times \mathbb R^n} \log \left( \frac{ \mathrm{d}P_{X,Y} }{ \mathrm{d}(P_X\otimes P_Y) } \right)\, \mathrm{d}P_{X,Y},$$

where inside $\log$ we have the Radon–Nikodym derivative.

Now let $f\colon \mathbb R^m\to \mathbb R^k$ and $g\colon \mathbb R^n\to \mathbb R^l$ be two smooth functions. I am interested in the mutual information between $f(X)$ and $g(Y)$:

\begin{align*}I\big(f(X); g(Y)\big) &= \int_{\mathbb R^k\times \mathbb R^l} \log \left( \frac{ \mathrm{d} \big( (f\times g)_\sharp P_{X,Y} \big) }{ \mathrm{d}(f_\sharp P_X\otimes g_\sharp P_Y) } \right)\, \mathrm{d}\big( (f\times g)_\sharp P_{X,Y}\big) \\ &= \int_{\mathbb R^m\times \mathbb R^n} \log \left( \frac{ \mathrm{d} \big( (f\times g)_\sharp P_{X,Y} \big) }{ \mathrm{d}(f_\sharp P_X\otimes g_\sharp P_Y) } \right) \circ(f\times g) \,\mathrm{d}P_{X,Y}. \end{align*}

I know that if $P_{X,Y}$ is absolutely continuous with respect to the Lebesgue measure on $\mathbb R^m\times \mathbb R^n$ and $f$ and $g$ are diffeomorphisms, then $$I(X; Y) = I(f(X); g(Y))).$$

I have two questions:

Does the above equality hold when the assumption that $f$ and $g$ are diffeomorphisms is replaced by being just smooth injective functions?

and

Is it possible that $I(X; Y)$ is finite and $I(f(X); g(Y)) > I(X; Y)$ for some functions $f$ and $g$?

Both questions seem quite natural to ask and for discrete random variables the situation is simpler. However, I don't really know enough about singular measures to confidently work with them in this continuous setting.

Any book reference would be very welcome – I checked Cohn's and Bogachev's books and did not find a discussion on it, but I do not know this subject well.

Edit: I am not sure if it helps, but in the book of Cover and Thomas there is the following statement:

If $\mathcal P = \{A_1, \dotsc, A_N\}$ is a finite partition of $\mathbb R^n$, we can define a discrete random variable $X_\mathcal P$ such that $\mathrm{Pr}(X_\mathcal P = i) = P_X(A_i)$. Then $I(X; Y) = \sup_{\mathcal P, \mathcal Q} I( X_\mathcal P; Y_\mathcal Q)$, where the supremum is taken over all the possible finite partitions.

1

There are 1 best solutions below

0
On

It seems that under suitable assumptions this invariance property is true, as described here.

On the other hand, if we assume that $f$ and $g$ are measurable (so that it makes sense to speak about random variables $f(X)$ and $g(Y)$), then a variant of the data processing inequality gives the negative answer to:

Is it possible that $I(X;Y)$ is finite and $I(f(X);g(Y))>I(X;Y)$ for some functions $f$ and $g$?