Expected Riemannian Metric of a Random Generator Function

88 Views Asked by At

I'm currently going through the paper Latent Space Oddity: On the Curvature of Deep Generative Models and after trying to prove Theorem 1 myself, I am unsure of whether my proof works since it uses a much weaker assumption ($\mu,\sigma\in\mathcal C^1$) than the one given in the paper. The proof given in the paper seems to be pretty much the same proof, which doesn't help me see why the stronger assumption $(\mu,\sigma\in\mathcal C^2)$ is necessary.

My Question is: Is my proof correct? If not, where does it go wrong? Why is the assumption that $\mu,\sigma\in\mathcal C^2$ necessary?

I will now describe the terminology involved, state the Theorem and give my proof:

Setup

Throughout this let $\mathcal Z\subseteq \mathbb R^d,\mathcal X\subseteq\mathbb R^D$, usually with $D\gg d$ (though I don't think this is important here). For a continuously differentiable function $f:\mathcal Z\rightarrow\mathcal X$ define a Riemannian metric on $\mathcal Z$ via $$M_z:=\nabla f(z)^T\cdot\nabla f(z)\in\mathbb R^{d\times d}$$ Here we assume that the Jacobian has full rank everwhere.

Theorem

Now we come to Theorem 1 in the paper: Let $\epsilon\sim\mathcal N(0,\mathbb I_D)$ be a $d$-dimensional standard Gaussian random vector. Let $\mu,\sigma$ be at least twice differentiable. Define the stochastic generator function $$f:\mathcal Z\rightarrow \mathcal X; f(z)=\mu(z)+\sigma(z)\odot\epsilon$$ where $\odot$ denotes componentwise multiplication. Then, defining the Riemannian metric as before, the expected Riemannian metric of this stochastic generator function is $$\mathbb E[M_z]=\nabla\mu(z)^T\nabla\mu(z)+\nabla\sigma(z)^T\nabla\sigma(z)$$

Proof

First, define the diagonal matrix $S^{(i)}$ with $S^{(i)}_{j,j}=\partial\sigma_j(z)/\partial z_i$. Then, we simply calculate $\nabla f$ as $$\nabla f(z)=\nabla\mu(z)+\underbrace{[S_1\epsilon,\dots,S_d\epsilon]}_{=:B}$$

Then clearly $$M_z=\nabla\mu(z)^T\nabla\mu(z)+\nabla\mu(z)^TB+B^T\nabla\mu(z)+B^TB.$$ Taking expected values leaves $$\mathbb E[M_z]=\nabla\mu(z)^T\nabla\mu(z)+\mathbb E[B^TB]=\nabla\mu(z)^T\nabla\mu(z)+\mathbb E\bigg[\big(\epsilon^T S^{(i)}S^{(j)}\epsilon\big)_{i,j=1,\dots,d}\bigg].$$ We are left to calculate the remaining expected value. Due to the fact that $\epsilon$ is a standard Gaussian, we have $\mathbb E[\epsilon_i\epsilon_j]=0$ for $i\neq j$ and $\mathbb E[\epsilon_i^2]=1$ and hence $$\mathbb E\bigg[\epsilon^TS^{(i)}S^{(j)}\epsilon\bigg]=\sum_{k=1}^d\partial \sigma_k(z)/\partial z_i\cdot \partial\sigma_k(z)\partial z_j.$$ We can see that this is equal to $(\nabla\sigma(z)^T\nabla\sigma(z))_{i,j}$.

It follows that $$\mathbb EM_z=\nabla \mu(z)^T\nabla\mu(z)+\nabla\sigma(z)^T\nabla\sigma(z)$$

1

There are 1 best solutions below

0
On BEST ANSWER

The theorem does hold under the assumption that $\mu, \sigma \in C^1$. We can prove it without any indices as follows. Write $$f(z) = \mu(z) + A\sigma(z),$$ where $$A = \text{diag}(\varepsilon).$$ Then by the chain rule, $$Df(z) = D\mu(z) + AD\sigma(z),$$ so \begin{align} Df(z)^TDf(z) &= D\mu(z)^TD\mu(z) + D\mu(z)^TAD\sigma(z) + D\sigma(z)^TAD\mu(z) + D\sigma(z)^TA^2D\sigma(z). \\ \end{align} Since $E(A) = 0$ and $E(A^2) = I$, we get $$E(Df(z)^TDf(z)) = D\mu(z)^TD\mu(z) + D\sigma(z)^T D\sigma(z).$$

My guess for why they included the assumption that $\mu, \sigma$ be $C^2$ is that they assume metric tensors are $C^1$ or better because they want to take derivatives of the metric tensor for forming Christoffel symbols, the geodesic equation, and other manipulations that require the metric to be $C^1$.