Problem
I'm trying to compute the functional derivative of the mutual information $$I(X;Z) = \iint dx dz p(x,z) \log \frac{p(x,z)}{p(x)p(z)} = \iint dx dz p(z|x) p(x) \log \frac{p(z | x)}{p(z)} $$ w.r.t. one of its conditionals, e.g., $p(z | x)$, that is $ \frac{\partial I(X; Z)}{\partial p(z | x)} $.
I understand that this requires me to compute the functional derivative.
My problem is how to actually apply the chain rule in the case where $$ p(z) = \int dx p(z | x) p(x) $$ since $p(z)$ depends on $p(z | x)$ as well.
Partial solution
I understand that I need to apply the definition of the functional derivative and reduce it to obtain the term that corresponds to the derivative. And I can follow that for term that doesn't depend on $p(z)$, i.e., $p(z | x) p(x) \log p(z | x)$.
Let's define the functinal $$ F[p(z | x)] = \iint dx dz p(z | x) p(x) \log p(z | x), $$ then I can use the definition of the functional derivative w.r.t. the conditional $$ \begin{aligned} \iint dx dz \frac{\partial F}{\partial p(z | x)} \phi(x, z) &= \left[ \frac{d}{d \varepsilon} F[p(z|x) + \varepsilon \phi(x,z)] \right]_{\varepsilon=0} \\ &= \left[ \frac{d}{d \varepsilon} \iint dx dz [p(z | x) + \varepsilon \phi(x,z)] p(x) \log [p(z | x) + \varepsilon \phi(x,z)] \right]_{\varepsilon=0} \\ &= \iint dx dz \left[ \phi(x,z) p(x) \log [p(z | x) + \varepsilon \phi(x,z)] + \frac{p(z|x)p(x)}{p(z|x)+\varepsilon \phi(x,z)} \phi(x,z) \right]_{\varepsilon=0} \\ &= \iint dx dz \left[ \phi(x,z) p(x) \log p(z | x) + p(x) \phi(x,z) \right] \\ &= \iint dx dz \left[ p(x) \left( \log p(z | x) + 1 \right) \right] \phi(x,z) \end{aligned} $$ Thus, $\frac{\partial F(x,z)}{\partial p(z | x)} = p(x) \left( \log p(z | x) + 1 \right)$.
Failing chain rule?
However, when I try to do the same with the second part that depends on $p(z)$, I don't understand how to correctly apply the chain rule.
Let's define the functional $$ G[p(z | x)] = \iint dx dz p(z | x) p(x) \log p(z), $$ then since $p(z)$ is another functional (is it?) that depends on $p(z | x)$ I need to also compute its derivative.
In the wikipedia definition of the chain rule, given two functionals $J$ and $K$ (since I'm using $F$ and $G$ above) is $$ \frac{\partial J[ K[\rho] ]}{\partial \rho(y)} = \int dx \frac{\partial J[K]}{\partial K(x)_{K=K[\rho]}} \cdot \frac{K[\rho](x)}{\partial \rho (y)}. $$ My understanding is that $K(x, [\rho(y)])$ is a function that depends on $x$ and the function $\rho(y)$. So the derivative requires me to do all possible partial derivatives w.r.t. the values on $x$ for $K$. Is this correct?
In my case, the functional that requires the chain rule, $p(z)$, then depends on $z$ and I should do the integral w.r.t. to it, right? But, what about the other term $p(z|x)p(x)$ can I do a traditional product rule even if one is a functional and the other isn't? The properties on the wikipedia state the product rule for two functionals but not a mix.
Naive approach
My attempt is to naively expand $p(z)$ within the definition, as follows: $$ \begin{aligned} \iint dx dz \frac{\partial G}{\partial p(z | x)} \phi(x, z) &= \left[ \frac{d}{d \varepsilon} G[p(z|x) + \varepsilon \phi(x,z)] \right]_{\varepsilon=0} \\ &= \left[ \frac{d}{d \varepsilon} \iint dx dz [p(z | x) + \varepsilon \phi(x,z)] p(x) \log \left[ \int dx' [p(z | x') + \varepsilon \phi(x,z)] p(x') \right] \right]_{\varepsilon=0} \\ &= \iint dx dz \left[ \phi(x,z) p(x) \log \left[ \int dx' [p(z | x') + \varepsilon \phi(x,z)] p(x') \right] + \frac{[p(z | x) + \varepsilon \phi(x,z)] p(x)}{\int dx' [p(z | x') + \varepsilon \phi(x,z)] p(x')} \int dx' \phi(x,z) p(x') \right]_{\varepsilon=0} \\ &= \iint dx dz \left[ \phi(x,z) p(x) \log \left[ \int dx' p(z | x') p(x') \right] + \frac{p(z | x) p(x)}{\int dx' p(z | x') p(x')} \phi(x,z) \int dx' p(x') \right] \\ &= \iint dx dz \left[ p(x) \left( \log p(z) +\frac{p(z | x)}{p(z)} \right) \right] \phi(x,z) \end{aligned} $$ Thus, $$ \frac{\partial G}{\partial p(z | x)} = p(x) \left( \log p(z) +\frac{p(z | x)}{p(z)} \right). $$
I'm not sure if this is the correct way of approaching it since I'm not using the chain rule, and I am using the same variation for both expansions.
Questions
- Is this functional derivative correct?
- How is the chain rule applied in this case?
Complete solution from @lidiia
$$ \begin{aligned} \iint dx dz \frac{\partial G}{\partial p(z | x)} \phi(x, z) &= \left[ \frac{d}{d \varepsilon} G[p(z|x) + \varepsilon \phi(x,z)] \right]_{\varepsilon=0} \\ &= \left[ \frac{d}{d \varepsilon} \iint dx dz [p(z | x) + \varepsilon \phi(x,z)] p(x) \log \left[ \int dx' [p(z | x') + \varepsilon \phi(x',z)] p(x') \right] \right]_{\varepsilon=0} \\ &= \iint dx dz \left[ \phi(x,z) p(x) \log \left[ \int dx' [p(z | x') + \varepsilon \phi(x',z)] p(x') \right] + \frac{[p(z | x) + \varepsilon \phi(x,z)] p(x)}{\int dx' [p(z | x') + \varepsilon \phi(x',z)] p(x')} \int dx' \phi(x',z) p(x') \right]_{\varepsilon=0} \\ &= \iint dx dz \phi(x,z) p(x) \log \left[ \int dx' p(z | x') p(x') \right] + \iint dx dz \frac{p(z | x) p(x)}{\int dx' p(z | x') p(x')} \int dx' \phi(x',z) p(x') \\ &= \iint dx dz \phi(x,z) p(x) \log p(z) + \iint dx'dz \phi(x',z) p(x') \int dx \frac{p(x, z)}{p(z)} \\ &= \iint dx dz \phi(x,z) p(x) \log p(z) + \iint dx dz \phi(x,z) p(x) \\ &= \iint dx dz \left[ p(x) \left( \log p(z) + 1 \right) \right] \phi(x,z) \end{aligned} $$ Thus, $$ \frac{\partial G}{\partial p(z | x)} = p(x) \left( \log p(z) + 1 \right). $$
I think that you are almost right in your "Naive approach" section. What I think is the correct way is to expand $p(z)$ and vary $p(z|x)$, however, since you are varying a function, variation of $p(z|x')$ should be $p(z|x') + \varepsilon \phi(x', z)$. Then $$ \left.\frac{d}{d\varepsilon}G[p(z|x) + \varepsilon \phi(z, x)]\right|_{\varepsilon = 0} = \left.\frac{d}{d\varepsilon}\iint dx dz [p(z|x) + \varepsilon \phi(x, z)]p(x)\log\int dx'[p(z|x') + \varepsilon \phi(x', z)]p(x')\right|_{\varepsilon = 0}$$ Using the fact that $\int dx' p(z|x')p(x') = p(z)$ and changing summation variables you should get that the functional derivative $$ \frac{\delta G[p(z|x)]}{\delta p(z|x)} = p(x)(\log p(z) +1)$$ and the full functional derivative of the functional $F[p(z|x)]$ is $$ \frac{\delta F[p(z|x)]}{\delta p(z|x)} = p(x)\log \frac{p(z|x)}{p(z)}.$$
In this paper on the page 4 there is a similar problem (equations 6, 7, and 10).