Estimating the Jensen-Shannon Divergence per batch while training a GANS

336 Views Asked by At

While training a GANS on images, I wish to be able to graph an estimate of the Jensen-Shannon Divergence (JSD) per epoch or per iteration, but I am a bit unsure of the math.

During a single training iteration, my two networks (generator and discriminator) have two losses:

  • The discriminator loss, which consists of the binary cross entropy of my discriminator input with a batch of both real and fake images, the fake images being generated by the generator. This loss is computed with a batch of $2N$ images, where $N$ are labelled real (value of $1$), and $N$ are fake/generated with labels $0$. I should add this step is further complicated when this process is repeated (only for the discriminator as in Goodrich et al.'s paper) more than once.

  • The generator loss, which maximizes the log probability of the discriminator falsely classifying examples from the generator as being real (i.e the generated images are given a true label) using the above $N$ fake/generated images.

Now from what I understand, $p_d(x)$ is the probability of the input being a real image, while $p_g(x)$ is some density on the space of sample images, which is defined implicitly by $p_g(x) = \int \delta (x - G(z)) \mathcal{N}(z; 0, I)dz$ where $G$ is a generator function mapping from a latent space to the set of images of a certain resolution.

Then the JSD is defined as $$JSD(p_g(x), p_d(x)) = \frac{1}{2} (\sum_{x \in X} p_g(x) log (\frac{p_g(x)}{(p_d(x)+p_g(x)/2}) + p_d(x) log (\frac{p_d(x)}{p_d(x)+p_g(x)/2}))$$

I think from each batch $\{x_i\}$ one can obtain a Monte Carlo estimate of the JSD, assymetric in the number of training examples per network per batch, $$JSD(p_g(x), p_d(x)) \approx \frac{1}{2N} (\sum_{i=1}^{2N} log (p_g(x_i)) - log((p_d(x_i) + p_g(x_i))/2) + \\ \frac{1}{2(2N)} \sum_{i=1}^{N} log (p_d(x_i)) - log((p_d(x_i) + p_g(x_i))/2)$$

Do I have this right? I'm abit unsure about the asymmetry or the Monte Carlo estimate formulation. I'm also not clear about how to get $p_g(x)$, is this simply my second loss ?

1

There are 1 best solutions below

0
On

The Jensen-Shanon divergence is just the symmetric form of the Kullbach-Liebler divergence. It is written in Goodfellow et al's 2014 GAN paper in the particular way you cite above to obtain the optimum value of the GAN cost function. The problem is that the validity of equation (5) (your JSD) requires the optimal discriminator (Proposition 1 = equation (2) from the 2014 GAN paper). This proposition has a hidden assumption of dim(z) $\geq$ dim(x) for its validity. When dim(z) < dim(x), which applies in practice, the result is false. The reasons for this are given in section 2 of the paper https://www.researchgate.net/publication/356815736_Convergence_and_Optimality_Analysis_of_Low-Dimensional_Generative_Adversarial_Networks_using_Error_Function_Integrals

So in fact, you should not assume that there is any connection to Jensen-Shannon divergence in practical GAN training, where dim(x) > dim(z) in general. This has recently been confirmed on Cifar-10 in C. Qin et al. (2020) 1.

To your specific question of $p_g(x)$: there does not seem to be a simple way to obtain this since although z may just have a multivariate Gaussian PDF, x=G(z) is the output of a complicated neural network. What you also should know is that when dim(x) > dim(z), the generator output PDF is non-unique and degenerate, i.e. contains delta functions. It is the presence of the latter that invalidate the variational calculus argument in the proof of Proposition 1.