Let $X\colon \Omega\to \mathbb R^m$ and $Y\colon \mathbb \Omega \to\mathbb R^n$ be random variables.
The mutual information is defined as $$I(X; Y) = \int_{\mathbb R^m\times \mathbb R^n} \log \left( \frac{ \mathrm{d}P_{X,Y} }{ \mathrm{d}(P_X\otimes P_Y) } \right)\, \mathrm{d}P_{X,Y},$$
where inside $\log$ we have the Radon–Nikodym derivative.
Now let $f\colon \mathbb R^m\to \mathbb R^k$ and $g\colon \mathbb R^n\to \mathbb R^l$ be two smooth functions. I am interested in the mutual information between $f(X)$ and $g(Y)$:
\begin{align*}I\big(f(X); g(Y)\big) &= \int_{\mathbb R^k\times \mathbb R^l} \log \left( \frac{ \mathrm{d} \big( (f\times g)_\sharp P_{X,Y} \big) }{ \mathrm{d}(f_\sharp P_X\otimes g_\sharp P_Y) } \right)\, \mathrm{d}\big( (f\times g)_\sharp P_{X,Y}\big) \\ &= \int_{\mathbb R^m\times \mathbb R^n} \log \left( \frac{ \mathrm{d} \big( (f\times g)_\sharp P_{X,Y} \big) }{ \mathrm{d}(f_\sharp P_X\otimes g_\sharp P_Y) } \right) \circ(f\times g) \,\mathrm{d}P_{X,Y}. \end{align*}
I know that if $P_{X,Y}$ is absolutely continuous with respect to the Lebesgue measure on $\mathbb R^m\times \mathbb R^n$ and $f$ and $g$ are diffeomorphisms, then $$I(X; Y) = I(f(X); g(Y))).$$
I have two questions:
Does the above equality hold when the assumption that $f$ and $g$ are diffeomorphisms is replaced by being just smooth injective functions?
and
Is it possible that $I(X; Y)$ is finite and $I(f(X); g(Y)) > I(X; Y)$ for some functions $f$ and $g$?
Both questions seem quite natural to ask and for discrete random variables the situation is simpler. However, I don't really know enough about singular measures to confidently work with them in this continuous setting.
Any book reference would be very welcome – I checked Cohn's and Bogachev's books and did not find a discussion on it, but I do not know this subject well.
Edit: I am not sure if it helps, but in the book of Cover and Thomas there is the following statement:
If $\mathcal P = \{A_1, \dotsc, A_N\}$ is a finite partition of $\mathbb R^n$, we can define a discrete random variable $X_\mathcal P$ such that $\mathrm{Pr}(X_\mathcal P = i) = P_X(A_i)$. Then $I(X; Y) = \sup_{\mathcal P, \mathcal Q} I( X_\mathcal P; Y_\mathcal Q)$, where the supremum is taken over all the possible finite partitions.
It seems that under suitable assumptions this invariance property is true, as described here.
On the other hand, if we assume that $f$ and $g$ are measurable (so that it makes sense to speak about random variables $f(X)$ and $g(Y)$), then a variant of the data processing inequality gives the negative answer to: