How de we exactly apply the LOTUS Theorem in GANs?

Question

How de we exactly apply the LOTUS Theorem in GANs?

209 Views Asked by Bumbble Comm At 26 Mar 2026 - 12:52

Since in Goodfellow's original paper the cost function is derived by omitting some passages, I am trying to derive it step-by-step. We have that:

$$ V(\mathcal{G},\mathcal{D}) = \mathbb{E}_{\mathbf{x}\sim p_{\text{data}}(\mathbf{x})}\log(\mathcal{D}(\mathbf{x}))\mathbf{x}+\mathbb{E}_{\mathbf{z}\sim p_{z}(\mathbf{z})}\log(1-\mathcal{D}(\mathcal{G}(\mathbf{z}))d\mathbf{z} \\ =\int_{\mathbf{x}}p_{\text{data}}(\mathbf{x})\log(\mathcal{D}(\mathbf{x}))d\mathbf{x} + \int_{\mathbf{z}}p_{\mathbf{z}}(\mathbf{z})\log(1-\mathcal{D}(\mathcal{G}(\mathbf{z}))d\mathbf{z} \\ =\int_{\mathbf{x}} \left( p_{\text{data}}(\mathbf{x}) \log(\mathcal{D}(\mathbf{x})) + p_{g}(\mathbf{x}) \log(1-\mathcal{D}(\mathbf{x})) \right) d\mathbf{x} $$

Now, I've understood that the last equality comes from the so-called Law Of The Unconscious Statistician (aka LOTUS Theorem).

So I'd like to prove that $$ \int_{\mathbf{z}}p_{z}(\mathbf{z})\log(1-\mathcal{D}(\mathcal{G}(\mathbf{z})))d\mathbf{z} = \int_{\mathbf{x}}p_{g}(\mathbf{x}) \log(1-\mathcal{D}(\mathbf{x})) d\mathbf{x} $$

For a continuous random variable $\mathbf{z}$, let $\mathbf{x} = \mathcal{G}(\mathbf{z})$, and suppose that $\mathcal{G}$ is differentiable and that its inverse $\mathcal{G}^{-1}$ (but are we sure that an MLP always has an inverse?) is monotonic (are we sure about that too for an MLP?). By the formula for inverse functions and differentiation we have that

$$ \frac{d\mathbf{z}}{d \mathbf{x}}\cdot \frac{d \mathbf{x}}{d \mathbf{z}} = 1 $$ $$ \frac{d\mathbf{z}}{d \mathbf{x}} = \frac{1}{\frac{d \mathbf{x}}{d \mathbf{z}}} $$ $$ \frac {d} {d\mathbf{x}} (\mathcal{G}^{-1}(\mathbf{x})) = \frac {1} {\frac{d\mathcal{G}(\mathcal{G}^{-1}(\mathbf{x}))}{d(\mathcal{G}^{-1}(\mathbf{x}))}} $$ $$ \frac {d} {d\mathbf{x}} (\mathcal{G}^{-1}(\mathbf{x})) = \frac {1} {\mathcal{G}^{'}(\mathcal{G}^{-1}(\mathbf{x}))} $$ because $\mathbf{z}=\mathcal{G}^{-1}(\mathbf{x})$ $$ d\mathbf{z} = \frac{1}{\mathcal{G}^{'}(\mathcal{G}^{-1}(\mathbf{x}))}d\mathbf{x} $$ so that by a change of variables we can rewrite everything in function of $\mathbf{x}$ ,

$$ \int_{-\infty}^{\infty} p_{z}(\mathbf{z}) \log \left( 1-\mathcal{D}(\mathcal{G}(\mathbf{z})) \right) d\mathbf{z} = $$ $$ \int_{-\infty}^{\infty} p_{z}(\mathcal{G}^{-1}(\mathbf{x})) \log \left( 1-\mathcal{D}(\mathbf{x}) \right) \frac{1}{\mathcal{G}^{'}(\mathcal{G}^{-1}(\mathbf{x}))}d\mathbf{x}\\ $$ $$ \int_{-\infty}^{\infty} \color{red}{p_{z}(\mathcal{G}^{-1}(\mathbf{x}))} \log \left( 1-\mathcal{D}(\mathbf{x}) \right) \color{red}{ \frac{1}{\mathcal{G}^{'}(\mathcal{G}^{-1}(\mathbf{x}))}}d\mathbf{x}\\ $$ $$ \int_{-\infty}^{\infty} \color{red}{p_{z}(\mathcal{G}^{-1}(\mathbf{x}))} \color{red}{ \frac{1}{\mathcal{G}^{'}(\mathcal{G}^{-1}(\mathbf{x}))}} \log \left( 1-\mathcal{D}(\mathbf{x}) \right) d\mathbf{x}\\ $$

Now, how can we prove that $\color{red}{p_{z}(\mathcal{G}^{-1}(\mathbf{x}))} \color{red}{ \frac{1}{\mathcal{G}^{'}(\mathcal{G}^{-1}(\mathbf{x}))}} = p_{g}(\mathbf{x})$?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2022-02-13 12:42:30

Going from line 2 to line 3 involves a change of variables, i.e., a mapping of the latent variables by some deterministic function $G(\cdot):R^{n_z}\rightarrow R^{n_x}$, which must be invertible and differentiable. Consider the expectation of a function of a scalar random variable $z$ with PDF ${\rm p}_z(z)$:

${\rm E}_z[f(z)] = \int_{-\infty}^{\infty} f(z)\,{\rm p}_z(z)\,dz$

and apply an invertible change of variable by letting $x=g(z)$ where $x$ is a scalar random variable with PDF ${\rm p}_x(x)$, with the two PDFs satisfying ${\rm p}_x(x)\,dx={\rm p}_z(z)\,dz$. It follows that the expectation can be rewritten as

${\rm E}_z[f(z)]=\int_{-\infty}^{\infty} f(g^{-1}(x))\,{\rm p}_x(x)\,dx={\rm E}_x[f(g^{-1}(x))]$

The result generalises to (i) many-to-one scalar transformations (ii) invertible vector transformations $G(\cdot)$ with ${\rm dim}(x)={\rm dim}(z)$; and (iii) vector transformations where ${\rm dim}(x)<{\rm dim}(z)$ (see e.g., Papoulis: Probability, Random Variables, and Stochastic Processes). In case (i), the PDF is composed of a sum of terms each of which corresponds to a root of the equation $g(z)=0$. In case (ii), the two PDFs are linked via the determinant of the Jacobian. In the case where ${\rm dim}(x)>{\rm dim}(z)$, the PDF of $x$ is non-unique and degenerate (containing delta functions). For an multilayer perceptron (MLP) neural network G(z) would normally be differentiable, but it may not be invertible. Further details may be found in my paper 1, which I am paraphrasing here and which contains some other interesting results concerning Proposition 1 from the 2014 GAN paper by Goodfellow et al.

How de we exactly apply the LOTUS Theorem in GANs?

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in PROBABILITY-DISTRIBUTIONS

Related Questions in MACHINE-LEARNING

Related Questions in CHANGE-OF-VARIABLE

Trending Questions

Popular # Hahtags

Popular Questions