If we write $p(\cdot)$ for a discrete probability function and $\mu(\cdot)$ for a continuous density function, then why does the following hold: $$-\sum_x p(x) \log p(x) + \sum_x \int \mu(x,y) \log \mu(x,y) dy = \sum_x \int \mu(x,y) \log \mu(y \mid x) dy$$
[Added for clarity]: $X$ is a discrete variable and $Y$ is a continuous variable, both are drawn from density function $μ(x,y)$. In other words, $p(x)=\int\mu(x,y)dy$ and $\mu(y)=\sum_x\mu(x,y)$.
I'm working through an information theory paper and stuck at this part. Thank you for your help and please let me know if there is anything I can do to clarify.
\begin{align*} \sum_x \int \mu(x,y) \log \mu(y \mid x) dy &= \sum_x \int \mu(x,y) ( \log \mu(x, y) -\log p(x) ) dy \\ &= \sum_x \int \mu(x,y) \log \mu(x, y) dy - \sum_x \int \mu(x,y) \log p(x) dy \\ &= \sum_x \int \mu(x,y) \log \mu(x, y) dy - \sum_x \log p(x) \int \mu(x,y) dy \\ &= \sum_x \int \mu(x,y) \log \mu(x, y) dy - \sum_x p(x)\log p(x) \\ \end{align*}