How to compute the functional derivative chain rule on the Mutual Information w.r.t. the conditional distribution

70 Views Asked by At

Problem

I'm trying to compute the functional derivative of the mutual information $$I(X;Z) = \iint dx dz p(x,z) \log \frac{p(x,z)}{p(x)p(z)} = \iint dx dz p(z|x) p(x) \log \frac{p(z | x)}{p(z)} $$ w.r.t. one of its conditionals, e.g., $p(z | x)$, that is $ \frac{\partial I(X; Z)}{\partial p(z | x)} $.

I understand that this requires me to compute the functional derivative.

My problem is how to actually apply the chain rule in the case where $$ p(z) = \int dx p(z | x) p(x) $$ since $p(z)$ depends on $p(z | x)$ as well.

Partial solution

I understand that I need to apply the definition of the functional derivative and reduce it to obtain the term that corresponds to the derivative. And I can follow that for term that doesn't depend on $p(z)$, i.e., $p(z | x) p(x) \log p(z | x)$.

Let's define the functinal $$ F[p(z | x)] = \iint dx dz p(z | x) p(x) \log p(z | x), $$ then I can use the definition of the functional derivative w.r.t. the conditional $$ \begin{aligned} \iint dx dz \frac{\partial F}{\partial p(z | x)} \phi(x, z) &= \left[ \frac{d}{d \varepsilon} F[p(z|x) + \varepsilon \phi(x,z)] \right]_{\varepsilon=0} \\ &= \left[ \frac{d}{d \varepsilon} \iint dx dz [p(z | x) + \varepsilon \phi(x,z)] p(x) \log [p(z | x) + \varepsilon \phi(x,z)] \right]_{\varepsilon=0} \\ &= \iint dx dz \left[ \phi(x,z) p(x) \log [p(z | x) + \varepsilon \phi(x,z)] + \frac{p(z|x)p(x)}{p(z|x)+\varepsilon \phi(x,z)} \phi(x,z) \right]_{\varepsilon=0} \\ &= \iint dx dz \left[ \phi(x,z) p(x) \log p(z | x) + p(x) \phi(x,z) \right] \\ &= \iint dx dz \left[ p(x) \left( \log p(z | x) + 1 \right) \right] \phi(x,z) \end{aligned} $$ Thus, $\frac{\partial F(x,z)}{\partial p(z | x)} = p(x) \left( \log p(z | x) + 1 \right)$.

Failing chain rule?

However, when I try to do the same with the second part that depends on $p(z)$, I don't understand how to correctly apply the chain rule.

Let's define the functional $$ G[p(z | x)] = \iint dx dz p(z | x) p(x) \log p(z), $$ then since $p(z)$ is another functional (is it?) that depends on $p(z | x)$ I need to also compute its derivative.

In the wikipedia definition of the chain rule, given two functionals $J$ and $K$ (since I'm using $F$ and $G$ above) is $$ \frac{\partial J[ K[\rho] ]}{\partial \rho(y)} = \int dx \frac{\partial J[K]}{\partial K(x)_{K=K[\rho]}} \cdot \frac{K[\rho](x)}{\partial \rho (y)}. $$ My understanding is that $K(x, [\rho(y)])$ is a function that depends on $x$ and the function $\rho(y)$. So the derivative requires me to do all possible partial derivatives w.r.t. the values on $x$ for $K$. Is this correct?

In my case, the functional that requires the chain rule, $p(z)$, then depends on $z$ and I should do the integral w.r.t. to it, right? But, what about the other term $p(z|x)p(x)$ can I do a traditional product rule even if one is a functional and the other isn't? The properties on the wikipedia state the product rule for two functionals but not a mix.

Naive approach

My attempt is to naively expand $p(z)$ within the definition, as follows: $$ \begin{aligned} \iint dx dz \frac{\partial G}{\partial p(z | x)} \phi(x, z) &= \left[ \frac{d}{d \varepsilon} G[p(z|x) + \varepsilon \phi(x,z)] \right]_{\varepsilon=0} \\ &= \left[ \frac{d}{d \varepsilon} \iint dx dz [p(z | x) + \varepsilon \phi(x,z)] p(x) \log \left[ \int dx' [p(z | x') + \varepsilon \phi(x,z)] p(x') \right] \right]_{\varepsilon=0} \\ &= \iint dx dz \left[ \phi(x,z) p(x) \log \left[ \int dx' [p(z | x') + \varepsilon \phi(x,z)] p(x') \right] + \frac{[p(z | x) + \varepsilon \phi(x,z)] p(x)}{\int dx' [p(z | x') + \varepsilon \phi(x,z)] p(x')} \int dx' \phi(x,z) p(x') \right]_{\varepsilon=0} \\ &= \iint dx dz \left[ \phi(x,z) p(x) \log \left[ \int dx' p(z | x') p(x') \right] + \frac{p(z | x) p(x)}{\int dx' p(z | x') p(x')} \phi(x,z) \int dx' p(x') \right] \\ &= \iint dx dz \left[ p(x) \left( \log p(z) +\frac{p(z | x)}{p(z)} \right) \right] \phi(x,z) \end{aligned} $$ Thus, $$ \frac{\partial G}{\partial p(z | x)} = p(x) \left( \log p(z) +\frac{p(z | x)}{p(z)} \right). $$

I'm not sure if this is the correct way of approaching it since I'm not using the chain rule, and I am using the same variation for both expansions.

Questions

  • Is this functional derivative correct?
  • How is the chain rule applied in this case?

Complete solution from @lidiia

$$ \begin{aligned} \iint dx dz \frac{\partial G}{\partial p(z | x)} \phi(x, z) &= \left[ \frac{d}{d \varepsilon} G[p(z|x) + \varepsilon \phi(x,z)] \right]_{\varepsilon=0} \\ &= \left[ \frac{d}{d \varepsilon} \iint dx dz [p(z | x) + \varepsilon \phi(x,z)] p(x) \log \left[ \int dx' [p(z | x') + \varepsilon \phi(x',z)] p(x') \right] \right]_{\varepsilon=0} \\ &= \iint dx dz \left[ \phi(x,z) p(x) \log \left[ \int dx' [p(z | x') + \varepsilon \phi(x',z)] p(x') \right] + \frac{[p(z | x) + \varepsilon \phi(x,z)] p(x)}{\int dx' [p(z | x') + \varepsilon \phi(x',z)] p(x')} \int dx' \phi(x',z) p(x') \right]_{\varepsilon=0} \\ &= \iint dx dz \phi(x,z) p(x) \log \left[ \int dx' p(z | x') p(x') \right] + \iint dx dz \frac{p(z | x) p(x)}{\int dx' p(z | x') p(x')} \int dx' \phi(x',z) p(x') \\ &= \iint dx dz \phi(x,z) p(x) \log p(z) + \iint dx'dz \phi(x',z) p(x') \int dx \frac{p(x, z)}{p(z)} \\ &= \iint dx dz \phi(x,z) p(x) \log p(z) + \iint dx dz \phi(x,z) p(x) \\ &= \iint dx dz \left[ p(x) \left( \log p(z) + 1 \right) \right] \phi(x,z) \end{aligned} $$ Thus, $$ \frac{\partial G}{\partial p(z | x)} = p(x) \left( \log p(z) + 1 \right). $$

1

There are 1 best solutions below

4
On BEST ANSWER

I think that you are almost right in your "Naive approach" section. What I think is the correct way is to expand $p(z)$ and vary $p(z|x)$, however, since you are varying a function, variation of $p(z|x')$ should be $p(z|x') + \varepsilon \phi(x', z)$. Then $$ \left.\frac{d}{d\varepsilon}G[p(z|x) + \varepsilon \phi(z, x)]\right|_{\varepsilon = 0} = \left.\frac{d}{d\varepsilon}\iint dx dz [p(z|x) + \varepsilon \phi(x, z)]p(x)\log\int dx'[p(z|x') + \varepsilon \phi(x', z)]p(x')\right|_{\varepsilon = 0}$$ Using the fact that $\int dx' p(z|x')p(x') = p(z)$ and changing summation variables you should get that the functional derivative $$ \frac{\delta G[p(z|x)]}{\delta p(z|x)} = p(x)(\log p(z) +1)$$ and the full functional derivative of the functional $F[p(z|x)]$ is $$ \frac{\delta F[p(z|x)]}{\delta p(z|x)} = p(x)\log \frac{p(z|x)}{p(z)}.$$

In this paper on the page 4 there is a similar problem (equations 6, 7, and 10).