Use of the Kullback-Leibler divergence in variational bayes and deep learning

21 Views Asked by At

I am trying to grasp the asymmetry of the KL-divergence from the point of variational approch and deep learning. Deep learning seeks $q$ by minimization of $KL(p||q)$ and variational Bayes seeks $q$ by minimization of $KL(q||p)$. In the case of deep learning, $p$ represents the distribution inside the training data and $q$ is the approximation of $p$ modeled by the neural network. In variational Bayes, $p$ represents the real posterior and $q$ its approximation. Now, $KL(p||q) = \int p \log \frac{p}{q}$ and $KL(q||p) =\int q \log \frac{q}{p}$, so we can say, that deep learning seeks $q$ that respects all the training data (if it used $KL(q||p)$, data that correspond to zero value of $q$ would not be used). But what does it say about the variational Bayes? If we choose a "bad" form of $q$, it may ignore some data in the approximated posterior so this would the the "wrong" version of KL-divergence. The only explanation that comes to my mind is that $KL(q||p)$ is used only because it is possible to optimize it while $KL(p||q)$ wouldn't be. Or is here some other advantage or explanation of $KL(q||p)$?