The definition of mutual information between two random variables $X \sim p_{X}$ and $Y \sim p_{Y}$ is given as follows: $$I(X; Y) := D_{\text{KL}}(p_{X, Y} \ \vert\vert \ p_{X}\otimes p_{Y})$$ I would like to prove the identity $$I(X; Y) = H(Y) - H(Y \mid X).$$
Proof attempt:
Using the definition of the Kullback-Leibler divergence, the mutual information can be rewritten as \begin{align} I(X, Y) &= \mathbb E_{(x, y) \sim p_{(X, Y)}}\left[ \log\frac{p_{(X, Y)}(x, y)}{p_{X}(x)p_{Y}(y)}\right] \\[4pt] &= \mathbb E_{(x, y)\sim p_{(X, Y)}}\left[ \log\frac{p_{(X, Y)}(x, y)}{p_{X}(x)}\right] - \mathbb E_{(x, y)\sim p_{(X, Y)}}\left[\log p_{Y}(y)\right] \\[10pt] &= -H(Y \mid X) - \mathbb E_{(x, y)\sim p_{(X, Y)}}\left[\log p_{Y}(y)\right] \end{align} However, in my opinion, the last expression does not evaluate to $H(Y)$, since the expectation is taken over the joint PDF.
Is there any mistake I am making? Thanks a lot!