I have some trouble understanding a step in the proof of Lemma 2.1 from [1]. I believe that it is not supposed to be very hard as it is not really justified in the paper, but I am not very familiar with the topic and I got very confused. I'll try to summarise here the point which is not clear to me.
Consider two measurable spaces $E$ and $F$ and a measurable surjective map $\Psi:E\to F$. Let $\mu$ and $\nu$ be measures on $E$, such that $\mu\ll\nu$. Then we can define the Kullback-Leibler divergence $\mathrm{KL}(\mu\|\nu)=\mathbb E_\mu[\log\frac{d\mu}{d\nu}]$.
Now consider the push-forward measures $\tilde\mu=\mu\circ\Psi^{-1}$ and $\tilde\nu=\nu\circ\Psi^{-1}$.
The following equality is given in [1], without justification,
$$\mathrm{KL}(\mu\|\nu)=\int_E\log\tfrac{d\mu}{d\nu}(x)d\mu(x) = \int_E\log\tfrac{d\tilde\mu}{d\tilde\nu}(\Psi(x))d\mu(x) + \int_F\left(\int_E\log\frac{d\mu_y}{d\nu_y}(x)d\mu_y(x)\right)d\tilde\mu(y)\,,$$ where $\mu_y$ and $\nu_y$ are the regular conditional distributions of $\mu$ and $\nu$, knowing that $\Psi=y$.
I understand that the last term makes sense and has somehow to do with the case in which $\Psi$ is not a bijection, as otherwise I guess we would have $\tfrac{d\tilde\mu}{d\tilde\nu}(\Psi(x))=\tfrac{d\mu}{d\nu}(x)$. However, I am quite lost as how to show why the above formula is correct.
What I've got so far is that for any measurable $\phi$ on $E$ we should have
$$\int_E\phi(x)d\mu(x) = \int_F\left(\int_E\phi(x)d\mu_y(x)\right)d\tilde\mu(y) = \int_F\left(\int_E\phi(x)\color{blue}{\tfrac{d\mu_y}{d\nu_y}(x)\tfrac{d\tilde\mu}{d\tilde\nu}(y)}d\nu_y(x)\right)d\tilde\nu(y)$$ and $$\int_E\phi(x)d\mu(x) = \int_E\tfrac{d\mu}{d\nu}(x)d\nu(x) = \int_F\left(\int_E\color{blue}{\tfrac{d\mu}{d\nu}(x)}\phi(x)d\nu_y(x)\right)d\tilde\nu(y)\,.$$ So somehow I see that $\tfrac{d\mu}{d\nu}(x)$ and $\tfrac{d\mu_y}{d\nu_y}(x)\tfrac{d\tilde\mu}{d\tilde\nu}(y)$ are related, but still I cannot see how to go from here to the desired formula.
[1] Transportation Cost-Information Inequalities and Applications to Random Dynamical Systems and Diffusions, Djellout, Guillin, and Wu, 2004.
To prove this result we need to justify that $\mu$-almost surely, it holds that $$ \frac{d\mu}{d\nu}(x) = \frac{d\tilde{\mu}}{d\tilde{\nu}}(y) \frac{d\mu_y}{d\nu_y}(x), \quad \text{where } y=\Psi(x). \tag{*}\label{*} $$ If this is the case then \begin{align} \mathrm{KL}(\mu\|\nu) & =\int_E \log \big [\tfrac{d\tilde{\mu}}{d\tilde{\nu}}(y) \tfrac{d\mu_y}{d\nu_y}(x) \big ]d\mu(x) \\ &=\int_E \log \big [\tfrac{d\tilde{\mu}}{d\tilde{\nu}}(y) \big ]d\mu(x) + \int_E \log \big [\tfrac{d\mu_y}{d\nu_y}(x) \big ]d\mu(x) \\ & = \int_E\log\tfrac{d\tilde\mu}{d\tilde\nu}(\Psi(x))d\mu(x) + \int_F\left(\int_E\log\frac{d\mu_y}{d\nu_y}(x) \, d\mu_y(x)\right)d\tilde\mu(y), \end{align} where the last line follows from Fubini's theorem.
By the uniqueness of the Radon-Nikodym derivative, to prove $\eqref{*}$ holds it is sufficient to show that for any measurable set $A \subseteq E$, $$ \mu(A) = \int_A \frac{d\tilde{\mu}}{d\tilde{\nu}}(y) \frac{d\mu_y}{d\nu_y}(x) \, d\nu(x). $$ For any measurable set $A$, it follows that \begin{align} \int_A \frac{d\tilde{\mu}}{d\tilde{\nu}}(y) \frac{d\mu_y}{d\nu_y}(x) \, d\nu(x) & = \int_{\Psi(A)}\left(\int_{A} \frac{d\mu_y}{d\nu_y}(x) \, d\nu_y(x)\right) \frac{d\tilde{\mu}}{d\tilde{\nu}}(y) \, d\tilde\nu(y)\\ & = \int_{\Psi(A)}\left(\int_{A} d\mu_y(x)\right) d\tilde\mu(y)\\ & = \int_A d\mu(x). \end{align}