Minimizing mutual information using Lagrange multipliers

588 Views Asked by At

Im trying to follow a minimization of mutual information using Lagrange multipliers in a highly cited paper called The Information Bottleneck Method (1999), page 4:

$$R(D) = \min_{p(\tilde{x}|x): E[d(x,\tilde{x})]\leq D} I(X,\tilde{X})$$

Where $X$ is a given random variable and $\tilde{X}$ is a compressed representation of it. The constraint is just an expectation of a distortion function $d(\cdot,\cdot)$, that gives the difference between two realizations of the r.v.s. The mutual information is defined as $I(X;\tilde{X}) =\sum_{x\in X}\sum_{\tilde{x}\in\tilde{X}} p(x,\tilde{x})\log\frac{p(\tilde{x}|x)}{p(\tilde{x})}$ and $E[d(x,\tilde{x})] = \sum_x\sum_\tilde{x} p(x,\tilde{x})d(x,\tilde{x})$

The functional to minimize is written as: $$\mathcal{F}[p(\tilde{x}|x)]=I(X;\tilde{X}) +\beta E[d(x,\tilde{x})]$$ Where $\beta$ is a Lagrange multiplier.

The paper continues with the derivative $\frac{\partial \mathcal{F}}{\partial p(\tilde{x}|x)}=0$ and thats the step I don't understand how to follow. It says: $$\frac{\partial \mathcal{F}}{\partial p(\tilde{x}|x)}=p(x)\left[\log\frac{p(\tilde{x}|x)}{p(\tilde{x})}+1-\frac{1}{p(\tilde{x})}\sum_{x^\prime}p(x^\prime)p(\tilde{x}|x^\prime) + \beta d(x,\tilde{x})+\frac{\lambda(x)}{p(x)}\right]$$

It says "$\lambda(x)$ are the normalization Lagrange multipliers for each $x$".

Questions: what are a normalization Lagrange multiplier? I dont know where that sum comes from, it only says after "since the marginal distribution satisfies $p(\tilde{x})=\sum_{x^\prime}p(x^\prime)p(\tilde{x}|x^\prime)$" (which is a condition that appeared before, but I dont know why is doing here).

PS: I tried to follow the process in this presentation, slide 19-20, but again, I dont understand how to derive the expression...

Thanks a lot beforehand.