How to test the convexity of mutual information using leading principal minors?

316 Views Asked by At

I read from textbooks that the mutual information function $I(X;Y)$ is a concave function of $p(x)$ for fixed $p(y|x)$ and a convex function of $p(y|x)$ for fixed $p(x)$.

I tried to test the convexity of $I(X;Y)$ by using the principal minors method. Given that $I(X;Y)=\sum_{x \in X} \sum_{y \in Y} p(x,y) ln \frac{p(x,y)}{p(x)p(y)}$.

Let $f=I(X;Y)$, I derived its Hessian matrix as

H(f) = \begin{bmatrix} \dfrac{\partial^2 f}{\partial p(x)^2} = 0 & \dfrac{\partial^2 f}{\partial p(x)p(y)} = 0 & \dfrac{\partial^2 f}{\partial p(x,y)} = 0\\[2.2ex] \dfrac{\partial^2 f}{\partial p(y)p(x)} = 0 & \dfrac{\partial^2 f}{\partial p(y)^2} = 0 & \dfrac{\partial^2 f}{\partial p(y)p(x,y)} = 0 \\[2.2ex] \dfrac{\partial^2 f}{\partial p(x,y)p(x)} = -\frac{1}{p(x)} & \dfrac{\partial^2 f}{\partial p(x,y)p(y)} = -\frac{1}{p(y)} & \dfrac{\partial^2 f}{\partial p(x,y)^2} = \frac{1}{p(x,y)} \\[2.2ex] \end{bmatrix}.

Its principal minor is $D_1 = 0, D_2 = 0, D_3=0$, which means the Hessian is indefinite for all $p(x),p(y),p(x,y)$, which in turn means $f$ is neither convex nor concave?

Based on user1559 explanation, I tried replacing $p(x,y)=p(x|y)p(y)$ and with $p(x|y)$ as a constant, I try again finding its Hessian matrix

Since $p(y_j|x_l)$ is held constant for every $j$, I am thinking if there is an easier solution. I try as follows:

Given that $f = \sum_i \sum_j p(y_j|x_i) ln \frac{p(y_j|x_i)}{p(y_j)}$, I get its Hessian matrix as

H(f) = \begin{bmatrix} \dfrac{\partial^2 f}{\partial p(x)^2} = 0 & \dfrac{\partial^2 f}{\partial p(x)p(y)} = \frac{-p(y_j|x_i)}{p(y_j)} \\ \dfrac{\partial^2 f}{\partial p(y)p(x)} = \frac{-p(y_j|x_i)}{p(y_j)} & \dfrac{\partial^2 f}{\partial p(y)^2} = \frac{\sum_i p(y_j|x_i)p(x_i)}{p(y_j)^2} \\ \end{bmatrix}.

I tried to find its prinicipal minors, but the Hessian matrix is indefinite. What is the problem here?

1

There are 1 best solutions below

3
On BEST ANSWER

I think what your textbook means is this. First give indices to $x$ and $y$ and decouple $p(x_i)$ from those quantities that does not depend on it: \begin{align*} I(X;Y) &= \sum_i \sum_j p(x_i,y_j) \ln \frac{p(x_i,y_j)}{p(x_i)p(y_j)}\\ &= \sum_i \sum_j p(y_j\mid x_i)\,p(x_i) \ln \frac{p(y_j\mid x_i)}{p(y_j)}\\ &= \sum_i \sum_j p(y_j\mid x_i)\,p(x_i) \ln \frac{p(y_j\mid x_i)}{\sum_k p(y_j\mid x_k)\,p(x_k)}. \end{align*} So, for fixed $\ell$, when $p(y_j\mid x_\ell)$ is held constant for every $j$, we have \begin{align*} &\frac{\partial^2 I(X;Y)}{\partial p(x_\ell)^2}\\ =& \sum_{i\not=\ell} \sum_j \frac{p(y_j\mid x_i)\,p(x_i)\,p(y_j\mid x_\ell)^2}{\left(\sum_k p(y_j\mid x_k)\,p(x_k)\right)^2}\\ &- \sum_j\frac{2p(y_j\mid x_\ell)}{\sum_k p(y_j\mid x_k)\,p(x_k)} + \sum_j\frac{p(y_j\mid x_\ell)^2p(x_\ell)}{\left(\sum_k p(y_j\mid x_k)\,p(x_k)\right)^2}\\ =& \sum_j \frac{p(y_j\mid x_\ell)\left[-\sum_{i\not=\ell} p(y_j\mid x_i)\,p(x_i)\,\left(2-p(y_j\mid x_\ell)\right) - p(y_j\mid x_\ell)p(x_\ell)\right]}{\left(\sum_k p(y_j\mid x_k)\,p(x_k)\right)^2}\\ \le&0. \end{align*} Therefore, $I$ is a concave function of $p(x)$ for fixed $p(y\mid x)$. Convexity is shown in a similar manner. If I have calculated the paritial derivative correctly, for fixed $k$ and $\ell$, we should have $$ \frac{\partial^2 I(X;Y)}{\partial p(y_\ell \mid x_k)^2}= \frac{p(x_k) \sum_{i\not=k} p(y_\ell \mid x_i) p(x_i)} {p(y_\ell \mid x_k) \sum_i p(y_\ell \mid x_i)p(x_i)} \ge0. $$