In the derivation of the KL divergence, where does the expectation come from?

99 Views Asked by At

I'm trying to understand the following part of the derivation of the KL divergence in variational inference.

\begin{align} & D_\text{KL}[Q(z\mid X) \parallel P(z\mid X)] = \sum_z Q(z\mid X) \log \frac{Q(z\mid X)}{P(z\mid X)} \tag 1 \\[8pt] = {} & \operatorname E\left[\log\frac{Q(z\mid X)}{P(z\mid X)}\right] \tag 2 \\[8pt] = {} & \operatorname E[\log Q(z\mid X) - \log P(z\mid X)] \tag 3 \end{align}

I don't understand how to go from step $(1)$ to step $(2).$ How did the summation in $(1)$ transform to the Expectation seen in $(2)$?

1

There are 1 best solutions below

0
On BEST ANSWER

$Q(z \mid X)$ is a PMF.

In general with a PMF $p(z)$ for a discrete random variable $Z$, you can write $$E[g(Z)] = \sum_z g(z) p(z).$$ In your case, $p(z) = Q(z \mid X)$, and $g(z) = \log \frac{Q(z \mid X)}{p(z \mid X)}$.