Derivative of Discrete Mutual Information

Question

Derivative of Discrete Mutual Information

64 Views Asked by Bumbble Comm At 09 Apr 2026 - 2:36

Given a fixed channel $p(y|x)$, and denoting the discrete input pmf $p(x)$, and the corresponding output pmf by $p(y)$, prove that: $$ \frac{\partial I(X;Y)}{\partial p(x)} = I(X=x;Y) - \log{e} $$ where, $$ I(X=x;Y) = \sum_{y} p(y|x) \log{\left(\frac{p(y|x)}{p(y)}\right)} $$ Note$^*$: the $\log$ is base 2 here.

Solution (thanks to @stochasticboy321):

We can write $$ I(X;Y) = \sum_{i}p(x_i)I(X=x_i;Y) \Longrightarrow \frac{\partial I(X;Y)}{\partial p(x_j)} = I(X=x_j;Y) + \sum_{i} p(x_i)\color{red}{\frac{\partial I(X=x_i;Y)}{\partial p(x_j)}}. $$ But, $$ \begin{aligned} \color{red}{\frac{\partial I(X=x_i;Y)}{\partial p(x_j)}} &= \frac{\partial }{\partial p(x_j)} \sum_{k} p(y_k|x_i) \log{\left(\frac{p(y_k|x_i)}{p(y_k)}\right)}\\ &= \frac{\partial }{\partial p(x_j)}\left(\sum_{k} p(y_k|x_i) \log{(p(y_k|x_i)) - \sum_{k} p(y_k|x_i)\log [p(y_k)]}\right) \\ &= -\frac{\partial }{\partial p(x_j)} \sum_{k} p(y_k|x_i)\log [p(y_k)] \\ &= - \sum_{k} p(y_k|x_i)\color{blue}{\frac{\partial \log [p(y_k)]}{\partial p(x_j)}} \end{aligned} $$ Where $p(y_k) =\sum_i p(y_k|x_i)p(x_i)$. Let's continue,

$$ \begin{aligned} \color{blue}{\frac{\partial \log [p(y_k)]}{\partial p(x_j)}} &= \frac{\log e}{p(y_k)} \frac{\partial }{\partial p(x_j)} \sum_i p(y_k|x_i)p(x_i)\\ &= \frac{\log e}{p(y_k)} p(y_k|x_j) \end{aligned} $$

Hence,

$$ \begin{aligned} \color{red}{\frac{\partial I(X=x_i;Y)}{\partial p(x_j)}} &= -\log e \left[\sum_k \color{purple}{\frac{p(y_k|x_i)p(y_k|x_j)}{p(y_k)}} \right]\\ &= -\log e \left[\sum_k \color{purple}{\frac{p(x_i|y_k)p(y_k|x_j)}{p(x_i)}} \right] \end{aligned} $$

Thus,

$$ \begin{aligned} \frac{\partial I(X;Y)}{\partial p(x_j)} &= I(X=x_j;Y) -\log e \sum_{i} p(x_i)\color{red}{\left[\sum_k \color{purple}{\frac{p(x_i|y_k)p(y_k|x_j)}{p(x_i)}} \right]}\\ &= I(X=x_j;Y) -\log e \sum_i p(x_i|y_k) \sum_k p(y_k|x_j) \\ &\stackrel{(a)}{=} I(X=x_j;Y) -\log e \end{aligned} $$ (a) The two sums add up to one since they are summations over legitimate probability space ($i, \ k$).

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2024-02-28 06:45:39

There are quite a few calculation errors in your work, e.g., $$ \partial_{p(x_j)} \log p(y) = \log e\frac{1}{p(y)} \partial_{p(x_j)} \sum_{i} p(y|x_i) p(x_i) = \log e\frac{1}{p(y)} \cdot p(y|x_j),$$ and the other $i \neq j$ terms don't enter since neither $p(y|x_i)$ nor $p(x_i)$ directly depend on $p(x_j)$. Also, when computing $\partial_{p(x_j)} I,$ the second term should also sum over $i = j$ since the $-\log p(y)$ term in $I(X = x_j;y)$ varies with $p(x_j)$.

I'll just start from scratch because I'm getting lost in the notation in the question. Clearly, $$ \partial_{p(x)} I(X;Y) = I(X = x;Y) + \sum_{x'} p(x') \partial_{p(x)} I(X = x';Y).$$

Now, $$I(X = x';Y) = \sum_y p(y|x') \log p(y|x') - \sum_y p(y|x') \log p(y)\\ = f(p(y|x')) - \sum_y p(y|x') \log p(y), $$ and so, $$ \partial_{p(x)} I(X = x';Y) = - \log e \sum_y \frac{p(y|x')}{p(y)}\partial_{p(x)} p(y) = -\log e \sum_y \frac{p(y|x')p(y|x)}{p(y)}.$$

This in turn yields $$\sum_{x'} p(x') \partial_{p(x)} I(X = x';Y) = - \log e \sum_{x',y} p(x') \frac{p(y|x')}{p(y)}p(y|x) \\ = -\log e \sum_y \frac{p(y|x)}{p(y)}\sum_{x'} p(x',y)\\ = - \log e \sum_y \frac{p(y|x)}{p(y)} \cdot p(y) = - \log e \sum_y p(y|x) = -\log e, $$ and we're done. Here we used Bayes' rule for the first equality, marginalisation for the second, and the fact that (conditional) probabilities must add up to one for the final equality.

Incidentally, it's a short hop from here to showing the concavity of $I(X;Y)$ in $p(x)$ for a fixed $p(y|x)$: taking a derivative again, you'll find that $\partial_{p(x')} \partial_{p(x)} I(X;Y) = - \log e \sum_y p(y|x')p(y|x)/p(y) = - \langle v_x, v_{x'}\rangle,$ where the $|\mathcal{Y}|$-dimensional vector $v_x$ is defined as $v_x(y) = \sqrt{\log e} p(y|x)/\sqrt{p(y)}$. But this represents the Hessian of $I(X;Y)$ as the negative of a Gram matrix, which implies that it is negative semidefinite.

Derivative of Discrete Mutual Information

There are 1 best solutions below

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in INFORMATION-THEORY

Related Questions in MUTUAL-INFORMATION

Trending Questions

Popular # Hahtags

Popular Questions