It is stated in my information theory textbook that the conditional mutual information of (discrete) random variables $X, Y$, given $Z$ is defined as $I(X; Y|Z) = H(X|Z) - H(X| Y, Z)$, which is equal to $\mathbb{E}_{(x,y,z) \sim P(x,y,z)} log \frac{P(X, Y|Z)}{P(X|Z)P(Y|Z)}$. I'm not sure why this equality holds.
Evaluating the LHS I see
$$ H(X|Z) - H(X|Y, Z)$$ $$ = \sum_{z \in Z} P(z) \sum_{x \in x} P(x|z) \log(P(x|z)) - \sum_{(y,z) \in (Y,Z)} P(y,z) \sum_{x \in X} P(x|y,z) \log(P(x|y,z)) $$
Then on the RHS I see
$$\mathbb{E}_{(x,y,z) \sim p(x,y,z)} \log \left(\frac{P(X, Y|Z)}{P(X|Z)P(Y|Z)}\right) = \sum_{(x,y,z) \in (X,Y,Z)} P(x,y,z) \log \left(\frac{P(x,y|z)}{P(x|z)P(y|z)}\right)$$
Its not immediately obvious to me why these are equal. Any hints appreciated.
Instead of notations such as $P(x,y|x)$, I use the safer notations $p_{X}(x)$, $p_{X,Y}(x,y)$, $p_{X|Y}(x|Y)$ and such, it makes it much easier (and correct) to be careful.
It follows from $P_{X|Y,Z}(x|y,z)=\frac{P_{X,Y|Z}(x,y|z)}{P_{Y|Z}(y|z)}$, the fact that $\log\left( \frac a b \right)=\log(a)-\log(b)$ as well as $\sum_{y} p_{X,Y,Z}(x,y,z)=p_{X,Z}(x,z)$. The derivation looks like that : \begin{align*} &\mathbb E\left[ \log\left( \frac{P_{X,Y|Z}(X,Y|Z)}{P_{X|Z}(X|Z)P_{Y|Z}(Y|Z)} \right) \right] \\=& \sum_{x,y,z} p_{X,Y,Z}(x,y,z)\log\left( \frac{P_{X,Y|Z}(x,y|z)}{P_{X|Z}(x|z)P_{Y|Z}(y|z)} \right)\\ =& \sum_{x,y,z} p_{X,Y,Z}(x,y,z)\log\left( \frac{1}{P_{X|Z}(x|z)}\frac{P_{X,Y|Z}(x,y|z)}{P_{Y|Z}(y|z)} \right)\\ =&\sum_{x,y,z} p_{X,Y,Z}(x,y,z)\log\left( \frac{1}{P_{X|Z}(x|z)}\cdot P_{X|Y,Z}(x|y,z) \right)\\ =&\sum_{x,y,z} p_{X,Y,Z}(x,y,z)\log(P_{X|Y,Z}(x|y,z))\\&-\sum_{x,y,z} p_{X,Y,Z}(x,y,z)\log(P_{X|Z}(x|z))\\ =&\sum_{x,y,z} p_{X,Y,Z}(x,y,z)\log(P_{X|Y,Z}(x|y,z))\\&-\sum_{x,z} p_{X,Z}(x,z)\log(P_{X|Z}(x|z))\\ =&-H(X|Y,Z)+H(X|Z) \end{align*}