Confused by Kullback-Leibler on conditional probability distributions

Question

Confused by Kullback-Leibler on conditional probability distributions

5.7k Views Asked by Bumbble Comm At 09 Apr 2026 - 6:04

I understand the Kullback-Leibler divergence well enough when it comes to a probability distribution over a single variable. However, I'm currently trying to teach myself variational methods and the use of the KL divergence in conditional probabilities is catching me out. The source I'm working from is here.

Specifically, the author represents the KL divergence as follows:

$$KL(Q_ϕ (Z|X)||P(Z|X)) = \sum_{z∈Z} q_ϕ (z|x) log\frac{q_ϕ (z|x)}{p(z|x)}$$

Where the confusion arises is on the summation across $Z$. Given that $z \in Z$ and $x \in X$, I would have expected (by analogy with conditional entropy) a double sum here of the form:

$$KL(Q_ϕ (Z|X)||P(Z|X)) = \sum_{z∈Z} \sum_{x∈X} q_ϕ (z|x) log\frac{q_ϕ (z|x)}{p(z|x)}$$

Otherwise, it seems to me that KL is only being calculated for one sample from $X$. Am I missing something basic here? And if my intuitions are off, any tips on getting them back on track would be useful––I'm teaching myself this stuff, so I don't have the benefit of formal instruction.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 29 Nov 2018 - 11:22

I don't quite see what confuses you. Think about how we compute, for example, a conditional expectation: $E(Z \mid X)=\sum_Z P(Z \mid X) $ : that is, we sum only over $Z$, and the result is a function of the conditioning variable $X$. (Put in other way, your each value of $X$ we have that $P(Z \mid X=x)$ is a different probability distribution - and hence for each value of $X$ we have different values of the (conditioned to $X=x$) expectation, variance, etc). The same happens here. And the conditioned KL divergence is not a number, but a function of $X$.

**Bumbble Comm** · Accepted Answer

It depends on whether you are conditioning on a random variable or an event.

Given a random variable $x$,

$$ \operatorname{KL}[p(y \mid x) \,\|\, q(y \mid x)] \doteq \iint p(\bar{x},\bar{y}) \ln\frac{p(\bar{y} \mid \bar{x})}{q(\bar{y} \mid \bar{x})} \mathrm{d}\bar{x} \mathrm{d}\bar{y} \quad\text{or}\quad \sum_{\bar{x}}\sum_{\bar{y}} p(\bar{x},\bar{y}) \ln\frac{p(\bar{y} \mid \bar{x})}{q(\bar{y} \mid \bar{x})}. $$

Given an event $\bar{x}$,

$$ \operatorname{KL}[p(y \mid \bar{x}) \,\|\, q(y \mid \bar{x})] \doteq \int p(\bar{y}|\bar{x}) \ln\frac{p(\bar{y} \mid \bar{x})}{q(\bar{y} \mid \bar{x})} \mathrm{d}\bar{y} \quad\text{or}\quad \sum_{\bar{y}} p(\bar{y}|\bar{x}) \ln\frac{p(\bar{y} \mid \bar{x})}{q(\bar{y} \mid \bar{x})}. $$

Note how conditioning on an event is equivalent to changing the probability distribution over its variable to a point mass. This is what turns the joint into a conditional above,

$$ p'(x,y) \doteq p(y|x)\delta_{\bar{x}}(x)=p(y|\bar{x}). $$

To be more explicit, you can also choose instead of the KL conditioned on a random variable to use an expectation over event of the KL conditioned on those event,

$$ \operatorname{KL}[p(y \mid x) \,\|\, q(y \mid x)] =\operatorname{E}_{\bar{x}\sim p(x)}\big[ \operatorname{KL}[p(y \mid \bar{x}) \,\|\, q(y \mid \bar{x})] \big]. $$

Mixing up random variables and event is quite common but it's often easy to know from the context which is meant.

Confused by Kullback-Leibler on conditional probability distributions

There are 2 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in INFORMATION-THEORY

Related Questions in BAYESIAN

Related Questions in VARIATIONAL-ANALYSIS

Trending Questions

Popular # Hahtags

Popular Questions