Derivative of Discrete Mutual Information

63 Views Asked by At

Given a fixed channel $p(y|x)$, and denoting the discrete input pmf $p(x)$, and the corresponding output pmf by $p(y)$, prove that: $$ \frac{\partial I(X;Y)}{\partial p(x)} = I(X=x;Y) - \log{e} $$ where, $$ I(X=x;Y) = \sum_{y} p(y|x) \log{\left(\frac{p(y|x)}{p(y)}\right)} $$ Note$^*$: the $\log$ is base 2 here.

Solution (thanks to @stochasticboy321):

We can write $$ I(X;Y) = \sum_{i}p(x_i)I(X=x_i;Y) \Longrightarrow \frac{\partial I(X;Y)}{\partial p(x_j)} = I(X=x_j;Y) + \sum_{i} p(x_i)\color{red}{\frac{\partial I(X=x_i;Y)}{\partial p(x_j)}}. $$ But, $$ \begin{aligned} \color{red}{\frac{\partial I(X=x_i;Y)}{\partial p(x_j)}} &= \frac{\partial }{\partial p(x_j)} \sum_{k} p(y_k|x_i) \log{\left(\frac{p(y_k|x_i)}{p(y_k)}\right)}\\ &= \frac{\partial }{\partial p(x_j)}\left(\sum_{k} p(y_k|x_i) \log{(p(y_k|x_i)) - \sum_{k} p(y_k|x_i)\log [p(y_k)]}\right) \\ &= -\frac{\partial }{\partial p(x_j)} \sum_{k} p(y_k|x_i)\log [p(y_k)] \\ &= - \sum_{k} p(y_k|x_i)\color{blue}{\frac{\partial \log [p(y_k)]}{\partial p(x_j)}} \end{aligned} $$ Where $p(y_k) =\sum_i p(y_k|x_i)p(x_i)$. Let's continue,

$$ \begin{aligned} \color{blue}{\frac{\partial \log [p(y_k)]}{\partial p(x_j)}} &= \frac{\log e}{p(y_k)} \frac{\partial }{\partial p(x_j)} \sum_i p(y_k|x_i)p(x_i)\\ &= \frac{\log e}{p(y_k)} p(y_k|x_j) \end{aligned} $$

Hence,

$$ \begin{aligned} \color{red}{\frac{\partial I(X=x_i;Y)}{\partial p(x_j)}} &= -\log e \left[\sum_k \color{purple}{\frac{p(y_k|x_i)p(y_k|x_j)}{p(y_k)}} \right]\\ &= -\log e \left[\sum_k \color{purple}{\frac{p(x_i|y_k)p(y_k|x_j)}{p(x_i)}} \right] \end{aligned} $$

Thus,

$$ \begin{aligned} \frac{\partial I(X;Y)}{\partial p(x_j)} &= I(X=x_j;Y) -\log e \sum_{i} p(x_i)\color{red}{\left[\sum_k \color{purple}{\frac{p(x_i|y_k)p(y_k|x_j)}{p(x_i)}} \right]}\\ &= I(X=x_j;Y) -\log e \sum_i p(x_i|y_k) \sum_k p(y_k|x_j) \\ &\stackrel{(a)}{=} I(X=x_j;Y) -\log e \end{aligned} $$ (a) The two sums add up to one since they are summations over legitimate probability space ($i, \ k$).

1

There are 1 best solutions below

1
On

There are quite a few calculation errors in your work, e.g., $$ \partial_{p(x_j)} \log p(y) = \log e\frac{1}{p(y)} \partial_{p(x_j)} \sum_{i} p(y|x_i) p(x_i) = \log e\frac{1}{p(y)} \cdot p(y|x_j),$$ and the other $i \neq j$ terms don't enter since neither $p(y|x_i)$ nor $p(x_i)$ directly depend on $p(x_j)$. Also, when computing $\partial_{p(x_j)} I,$ the second term should also sum over $i = j$ since the $-\log p(y)$ term in $I(X = x_j;y)$ varies with $p(x_j)$.


I'll just start from scratch because I'm getting lost in the notation in the question. Clearly, $$ \partial_{p(x)} I(X;Y) = I(X = x;Y) + \sum_{x'} p(x') \partial_{p(x)} I(X = x';Y).$$

Now, $$I(X = x';Y) = \sum_y p(y|x') \log p(y|x') - \sum_y p(y|x') \log p(y)\\ = f(p(y|x')) - \sum_y p(y|x') \log p(y), $$ and so, $$ \partial_{p(x)} I(X = x';Y) = - \log e \sum_y \frac{p(y|x')}{p(y)}\partial_{p(x)} p(y) = -\log e \sum_y \frac{p(y|x')p(y|x)}{p(y)}.$$

This in turn yields $$\sum_{x'} p(x') \partial_{p(x)} I(X = x';Y) = - \log e \sum_{x',y} p(x') \frac{p(y|x')}{p(y)}p(y|x) \\ = -\log e \sum_y \frac{p(y|x)}{p(y)}\sum_{x'} p(x',y)\\ = - \log e \sum_y \frac{p(y|x)}{p(y)} \cdot p(y) = - \log e \sum_y p(y|x) = -\log e, $$ and we're done. Here we used Bayes' rule for the first equality, marginalisation for the second, and the fact that (conditional) probabilities must add up to one for the final equality.


Incidentally, it's a short hop from here to showing the concavity of $I(X;Y)$ in $p(x)$ for a fixed $p(y|x)$: taking a derivative again, you'll find that $\partial_{p(x')} \partial_{p(x)} I(X;Y) = - \log e \sum_y p(y|x')p(y|x)/p(y) = - \langle v_x, v_{x'}\rangle,$ where the $|\mathcal{Y}|$-dimensional vector $v_x$ is defined as $v_x(y) = \sqrt{\log e} p(y|x)/\sqrt{p(y)}$. But this represents the Hessian of $I(X;Y)$ as the negative of a Gram matrix, which implies that it is negative semidefinite.