I'm reading the Wasserstein GAN paper(https://arxiv.org/abs/1701.07875) by Martin Arjovsky et al. My question is about the proof of the statement 2 of Theorem 1 in the paper. Please see the appendix C. The proof goes as follows. The part I don't understand is $\color{red}{\text{colored in red}}$.
(...)
Now let $g$ be locally Lipschitz. Then, for a given pair $(\theta, z)$ there is a constant $L(\theta, z)$ and an open set $U$ such that $(\theta, z) \in U$, such that for every $(\theta', z') \in U$ we have $$ \text{(Equation 1)} \quad \quad \quad \|g_\theta(z) - g_{\theta'}(z')\| \ \le \ % L(\theta, z) \ (\|\theta - \theta'\| + \|z - z'\|). $$ $\color{red}{\text{By taking expectations and } z' = z \text{ we have}}$ $$ \text{(Equation 2)} \quad \quad \quad \mathbb E_z [\|g_\theta(z) - g_{\theta'}(z)\|] \ \le \ % \|\theta - \theta'\| \ % \mathbb E_z [L(\theta, z)] $$ whenever $(\theta', z) \in U$.
(...)
Why I don't understand the red part:
According to the proof,
the equation 1 holds for $(\theta',\ z') \in U$,
with the set $U$ dependent to the choice of $z$.
But we have to integrate equation 1 on $\mathcal Z$ to get equation 2.
This means $z$ varies, so $U$ does.
Hence, during the integration,
we cannot let $\theta'$ fixed since its range varies.
So my question is that whether I am wrong or not. Don't we need an additional assumption here?
If not, then could someone explain more about the red part?
I think that their argument is correct, but I totally understand your confusion about the sentence in red. It would have been clearer if they had instead written "By taking $z' =z$ and applying expectation on both sides we have...".
Indeed, let $(\theta,z)$ be a fixed pair. By local Lipschitzness of $g$, we know that there exists an open neighborhood of $(\theta,z)$, which we denote $U\equiv U(\theta,z)$, and a local Lipschitz constant $L\equiv L(\theta,z)$ such that $$ \|g_\theta(z) - g_{\theta'}(z')\| \ \le \ % L(\theta, z) \ (\|\theta - \theta'\| + \|z - z'\|)\ \text{ for all }(\theta',z')\in U(\theta,z)\tag{Eq. 1}$$
Because $(\text{Eq. 1})$ is true for all $(\theta',z')\in U(\theta,z) $, it is in particular true for the pair $(\theta',z) $ where $\theta'$ is such that $(\theta',z')\in U(\theta,z)$ for some $z'$. Indeed, I'm sure you can convince yourself that if $(\theta',z')\in U(\theta,z) $, then $(\theta',z)\in U(\theta,z) $ as well.
From this it follows $$ \|g_\theta(z) - g_{\theta'}(z)\| \ \le \ % L(\theta, z) \ \|\theta - \theta'\| \ \text{ for all }\theta'\in U_z(\theta)\tag{Eq. 1'}$$ Where $U_z(\theta) := \{\theta' : (\theta',z')\in U(\theta,z) \text{ for some } z'\}$. Now the result would follow by taking expectation with respect to $z$ in $(\text{Eq. 1'}) $, but the issue is that $U_z(\theta)$ depends on $z$. The only way around it is to restrict $\theta'$ to the neighborhood $U(\theta)$ defined as $U(\theta):=\{\theta' : (\theta',z)\in U(\theta,z)\text{ for all } z\} $, such that we have $$ \|g_\theta(z) - g_{\theta'}(z)\| \ \le \ % L(\theta, z) \ \|\theta - \theta'\| \ \text{ for all }\theta'\in U(\theta)\tag{Eq. 1''}$$ This is indeed the approach taken by the authors in their paper. Since $(\text{Eq. 1''}) $ is valid for all values of $z$, you can take expectations on both sides and conclude.