Since in Goodfellow's original paper the cost function is derived by omitting some passages, I am trying to derive it step-by-step. We have that:
$$ V(\mathcal{G},\mathcal{D}) = \mathbb{E}_{\mathbf{x}\sim p_{\text{data}}(\mathbf{x})}\log(\mathcal{D}(\mathbf{x}))\mathbf{x}+\mathbb{E}_{\mathbf{z}\sim p_{z}(\mathbf{z})}\log(1-\mathcal{D}(\mathcal{G}(\mathbf{z}))d\mathbf{z} \\ =\int_{\mathbf{x}}p_{\text{data}}(\mathbf{x})\log(\mathcal{D}(\mathbf{x}))d\mathbf{x} + \int_{\mathbf{z}}p_{\mathbf{z}}(\mathbf{z})\log(1-\mathcal{D}(\mathcal{G}(\mathbf{z}))d\mathbf{z} \\ =\int_{\mathbf{x}} \left( p_{\text{data}}(\mathbf{x}) \log(\mathcal{D}(\mathbf{x})) + p_{g}(\mathbf{x}) \log(1-\mathcal{D}(\mathbf{x})) \right) d\mathbf{x} $$
Now, I've understood that the last equality comes from the so-called Law Of The Unconscious Statistician (aka LOTUS Theorem).
So I'd like to prove that $$ \int_{\mathbf{z}}p_{z}(\mathbf{z})\log(1-\mathcal{D}(\mathcal{G}(\mathbf{z})))d\mathbf{z} = \int_{\mathbf{x}}p_{g}(\mathbf{x}) \log(1-\mathcal{D}(\mathbf{x})) d\mathbf{x} $$
For a continuous random variable $\mathbf{z}$, let $\mathbf{x} = \mathcal{G}(\mathbf{z})$, and suppose that $\mathcal{G}$ is differentiable and that its inverse $\mathcal{G}^{-1}$ (but are we sure that an MLP always has an inverse?) is monotonic (are we sure about that too for an MLP?). By the formula for inverse functions and differentiation we have that
$$ \frac{d\mathbf{z}}{d \mathbf{x}}\cdot \frac{d \mathbf{x}}{d \mathbf{z}} = 1 $$ $$ \frac{d\mathbf{z}}{d \mathbf{x}} = \frac{1}{\frac{d \mathbf{x}}{d \mathbf{z}}} $$ $$ \frac {d} {d\mathbf{x}} (\mathcal{G}^{-1}(\mathbf{x})) = \frac {1} {\frac{d\mathcal{G}(\mathcal{G}^{-1}(\mathbf{x}))}{d(\mathcal{G}^{-1}(\mathbf{x}))}} $$ $$ \frac {d} {d\mathbf{x}} (\mathcal{G}^{-1}(\mathbf{x})) = \frac {1} {\mathcal{G}^{'}(\mathcal{G}^{-1}(\mathbf{x}))} $$ because $\mathbf{z}=\mathcal{G}^{-1}(\mathbf{x})$ $$ d\mathbf{z} = \frac{1}{\mathcal{G}^{'}(\mathcal{G}^{-1}(\mathbf{x}))}d\mathbf{x} $$ so that by a change of variables we can rewrite everything in function of $\mathbf{x}$ ,
$$ \int_{-\infty}^{\infty} p_{z}(\mathbf{z}) \log \left( 1-\mathcal{D}(\mathcal{G}(\mathbf{z})) \right) d\mathbf{z} = $$ $$ \int_{-\infty}^{\infty} p_{z}(\mathcal{G}^{-1}(\mathbf{x})) \log \left( 1-\mathcal{D}(\mathbf{x}) \right) \frac{1}{\mathcal{G}^{'}(\mathcal{G}^{-1}(\mathbf{x}))}d\mathbf{x}\\ $$ $$ \int_{-\infty}^{\infty} \color{red}{p_{z}(\mathcal{G}^{-1}(\mathbf{x}))} \log \left( 1-\mathcal{D}(\mathbf{x}) \right) \color{red}{ \frac{1}{\mathcal{G}^{'}(\mathcal{G}^{-1}(\mathbf{x}))}}d\mathbf{x}\\ $$ $$ \int_{-\infty}^{\infty} \color{red}{p_{z}(\mathcal{G}^{-1}(\mathbf{x}))} \color{red}{ \frac{1}{\mathcal{G}^{'}(\mathcal{G}^{-1}(\mathbf{x}))}} \log \left( 1-\mathcal{D}(\mathbf{x}) \right) d\mathbf{x}\\ $$
Now, how can we prove that $\color{red}{p_{z}(\mathcal{G}^{-1}(\mathbf{x}))} \color{red}{ \frac{1}{\mathcal{G}^{'}(\mathcal{G}^{-1}(\mathbf{x}))}} = p_{g}(\mathbf{x})$?
Going from line 2 to line 3 involves a change of variables, i.e., a mapping of the latent variables by some deterministic function $G(\cdot):R^{n_z}\rightarrow R^{n_x}$, which must be invertible and differentiable. Consider the expectation of a function of a scalar random variable $z$ with PDF ${\rm p}_z(z)$:
${\rm E}_z[f(z)] = \int_{-\infty}^{\infty} f(z)\,{\rm p}_z(z)\,dz$
and apply an invertible change of variable by letting $x=g(z)$ where $x$ is a scalar random variable with PDF ${\rm p}_x(x)$, with the two PDFs satisfying ${\rm p}_x(x)\,dx={\rm p}_z(z)\,dz$. It follows that the expectation can be rewritten as
${\rm E}_z[f(z)]=\int_{-\infty}^{\infty} f(g^{-1}(x))\,{\rm p}_x(x)\,dx={\rm E}_x[f(g^{-1}(x))]$
The result generalises to (i) many-to-one scalar transformations (ii) invertible vector transformations $G(\cdot)$ with ${\rm dim}(x)={\rm dim}(z)$; and (iii) vector transformations where ${\rm dim}(x)<{\rm dim}(z)$ (see e.g., Papoulis: Probability, Random Variables, and Stochastic Processes). In case (i), the PDF is composed of a sum of terms each of which corresponds to a root of the equation $g(z)=0$. In case (ii), the two PDFs are linked via the determinant of the Jacobian. In the case where ${\rm dim}(x)>{\rm dim}(z)$, the PDF of $x$ is non-unique and degenerate (containing delta functions). For an multilayer perceptron (MLP) neural network G(z) would normally be differentiable, but it may not be invertible. Further details may be found in my paper 1, which I am paraphrasing here and which contains some other interesting results concerning Proposition 1 from the 2014 GAN paper by Goodfellow et al.