Derivation of Information Bottleneck Method

Question

Derivation of Information Bottleneck Method

46 Views Asked by Bumbble Comm At 12 Apr 2026 - 5:06

In the paper of Information Bottleneck, we aim to minimize the loss:

\begin{equation} \begin{aligned} \mathcal{L} &= I(X; \tilde{X}) - \beta I(\tilde{X}; Y) - \sum_{\tilde{x}, x} \lambda(x) p(\tilde{x}| x) \\ &= \sum_{\tilde{x}, x} p(\tilde{x}|x) p(x) \log \left[\frac{p(\tilde{x}|x)}{p(\tilde{x})} \right] - \beta \sum_{\tilde{x}, y} p(\tilde{x}, y) \log \left[\frac{p(\tilde{x}|y)}{p(\tilde{x})} \right] - \sum_{\tilde{x}, x} \lambda(x) p(\tilde{x}|x). \end{aligned} \end{equation}

Taking the derivatives w.r.t. to $p(\tilde{x}, x)$ for given $x$ and $\tilde{x}$, one should obtain the Eq.25 in the paper: \begin{equation} \begin{aligned} \frac{\partial \mathcal{L}}{\partial p(\tilde{x}|x)} &= p(x) [1 + \log p(\tilde{x}|x)] - \frac{\partial p(\tilde{x})}{\partial p(\tilde{x}|x)}[1 + \log p(\tilde{x})] \\ &- \beta \sum_{y} \frac{\partial p(\tilde{x}|y)}{\partial p(\tilde{x}|x)} p(y)[1 + \log p(\tilde{x}|y)] - \beta \frac{\partial p(\tilde{x})}{\partial p(\tilde{x}|x)}[1 + \log p(\tilde{x})] - \lambda (x) \end{aligned} \end{equation}

I don't quite understand how to get the term $p(x) [1 + \log p(\tilde{x}|x)] - \frac{\partial p(\tilde{x})}{\partial p(\tilde{x}|x)}[1 + \log p(\tilde{x})]$. It would be much appreciated if someone could show me the detailed derivation.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2022-11-10 18:40:58

This term is coming from the derivative of the first term in $\mathcal{L}$ only (one way to see this is that there is no $\beta$ multiplying things). Below, I'll assume that the $\log$s are natural (which seems to be the convention adopted in the paper too), and I'll try to use what I think is approach taken in the paper (although there are other natural ways of taking derivatives as well which give cosmetically different terms).

Let me write the first term of $\mathcal{L}$ as $$ \mathcal{L}_1 = \sum_{x', \tilde{x}'} p(\tilde{x}'|x') p(x') \log\frac{p(\tilde{x}'|x')}{p(\tilde{x}')} =: \sum_{x',\tilde{x}'} \ell_{x', \tilde{x}'}$$ where I've introduced the $'$s to distinguish the dummy variables from the variables we're differentiating with respect to. The important part here is to note that $p(\tilde{x}') = \sum_{x''} p(\tilde{x}'|x'') p(x'')$ is a weighted sum over the conditionals, and so has nontrivial derivative with respect to $p(\tilde{x}|x)$ if $\tilde{x} = x.$

Now, by linearity of differentiation, we need to nail down the derivatives of $\ell_{x',\tilde{x}'}$ with respect to $p(\tilde{x}|x).$ Note that if $\tilde x' \neq \tilde x,$ then this derivative is $0$ - nothing in $\ell_{x', \tilde{x}'}$ depends on $p(\tilde{x}|x)$. So, it suffices to consider the derivative of \begin{align} \tilde{\mathcal{L}}_1 &= \sum_{x'} \ell_{x', \tilde{x}} \\ &= \sum_{x'} p(\tilde{x}|x')p(x') \log p(\tilde{x}|x') - \sum_{x'} p(\tilde{x}|x')p(x') \log p(\tilde{x}) \\ &= \sum_{x'} p(\tilde{x}|x')p(x') \log p(\tilde{x}|x') - p(\tilde{x}) \log p(\tilde{x}),\end{align} where the second equality uses the expansion of $p(\tilde{x})$ above.

Now we can apply the chain rule. The derivative of $u \mapsto u \log u$ is $(1 + \log(u),$ and so, $$\frac{\partial \mathcal{L}_1}{\partial p(\tilde{x}|x)} = \frac{\partial \tilde{\mathcal{L}}_1}{\partial p(\tilde{x}|x)} = \sum_{x'} p(x')[1 + \log p(\tilde{x}|x')] \frac{\partial p(\tilde{x}|x')}{\partial p(\tilde{x}|x)} - [1 + \log p(\tilde{x})] \frac{\partial p(\tilde{x})}{\partial p(\tilde{x}|x)},$$ and we're done upon observing that $\frac{\partial p(\tilde{x}|x')}{\partial p(\tilde{x}|x)} = 0$ for $x' \neq x.$

The second set of terms in the derivative (i.e. the stuff multiplied by $\beta$) is exactly the same idea, but now instead of the convenient $\frac{\partial p(\tilde{x}|x')}{\partial p(\tilde{x}|x)} = 0,$ we have the potentially non-zero $\frac{\partial p(\tilde{x}|y)}{\partial p(\tilde{x}|x)}$s, and so you end up with the sum.

Derivation of Information Bottleneck Method

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in DERIVATIVES

Related Questions in OPTIMIZATION

Related Questions in INFORMATION-THEORY

Trending Questions

Popular # Hahtags

Popular Questions