How to test the convexity of mutual information using leading principal minors?

Question

How to test the convexity of mutual information using leading principal minors?

316 Views Asked by Bumbble Comm At 08 Apr 2026 - 4:15

I read from textbooks that the mutual information function $I(X;Y)$ is a concave function of $p(x)$ for fixed $p(y|x)$ and a convex function of $p(y|x)$ for fixed $p(x)$.

I tried to test the convexity of $I(X;Y)$ by using the principal minors method. Given that $I(X;Y)=\sum_{x \in X} \sum_{y \in Y} p(x,y) ln \frac{p(x,y)}{p(x)p(y)}$.

Let $f=I(X;Y)$, I derived its Hessian matrix as

H(f) = \begin{bmatrix} \dfrac{\partial^2 f}{\partial p(x)^2} = 0 & \dfrac{\partial^2 f}{\partial p(x)p(y)} = 0 & \dfrac{\partial^2 f}{\partial p(x,y)} = 0\\[2.2ex] \dfrac{\partial^2 f}{\partial p(y)p(x)} = 0 & \dfrac{\partial^2 f}{\partial p(y)^2} = 0 & \dfrac{\partial^2 f}{\partial p(y)p(x,y)} = 0 \\[2.2ex] \dfrac{\partial^2 f}{\partial p(x,y)p(x)} = -\frac{1}{p(x)} & \dfrac{\partial^2 f}{\partial p(x,y)p(y)} = -\frac{1}{p(y)} & \dfrac{\partial^2 f}{\partial p(x,y)^2} = \frac{1}{p(x,y)} \\[2.2ex] \end{bmatrix}.

Its principal minor is $D_1 = 0, D_2 = 0, D_3=0$, which means the Hessian is indefinite for all $p(x),p(y),p(x,y)$, which in turn means $f$ is neither convex nor concave?

Based on user1559 explanation, I tried replacing $p(x,y)=p(x|y)p(y)$ and with $p(x|y)$ as a constant, I try again finding its Hessian matrix

Since $p(y_j|x_l)$ is held constant for every $j$, I am thinking if there is an easier solution. I try as follows:

Given that $f = \sum_i \sum_j p(y_j|x_i) ln \frac{p(y_j|x_i)}{p(y_j)}$, I get its Hessian matrix as

H(f) = \begin{bmatrix} \dfrac{\partial^2 f}{\partial p(x)^2} = 0 & \dfrac{\partial^2 f}{\partial p(x)p(y)} = \frac{-p(y_j|x_i)}{p(y_j)} \\ \dfrac{\partial^2 f}{\partial p(y)p(x)} = \frac{-p(y_j|x_i)}{p(y_j)} & \dfrac{\partial^2 f}{\partial p(y)^2} = \frac{\sum_i p(y_j|x_i)p(x_i)}{p(y_j)^2} \\ \end{bmatrix}.

I tried to find its prinicipal minors, but the Hessian matrix is indefinite. What is the problem here?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

I think what your textbook means is this. First give indices to $x$ and $y$ and decouple $p(x_i)$ from those quantities that does not depend on it: \begin{align*} I(X;Y) &= \sum_i \sum_j p(x_i,y_j) \ln \frac{p(x_i,y_j)}{p(x_i)p(y_j)}\\ &= \sum_i \sum_j p(y_j\mid x_i)\,p(x_i) \ln \frac{p(y_j\mid x_i)}{p(y_j)}\\ &= \sum_i \sum_j p(y_j\mid x_i)\,p(x_i) \ln \frac{p(y_j\mid x_i)}{\sum_k p(y_j\mid x_k)\,p(x_k)}. \end{align*} So, for fixed $\ell$, when $p(y_j\mid x_\ell)$ is held constant for every $j$, we have \begin{align*} &\frac{\partial^2 I(X;Y)}{\partial p(x_\ell)^2}\\ =& \sum_{i\not=\ell} \sum_j \frac{p(y_j\mid x_i)\,p(x_i)\,p(y_j\mid x_\ell)^2}{\left(\sum_k p(y_j\mid x_k)\,p(x_k)\right)^2}\\ &- \sum_j\frac{2p(y_j\mid x_\ell)}{\sum_k p(y_j\mid x_k)\,p(x_k)} + \sum_j\frac{p(y_j\mid x_\ell)^2p(x_\ell)}{\left(\sum_k p(y_j\mid x_k)\,p(x_k)\right)^2}\\ =& \sum_j \frac{p(y_j\mid x_\ell)\left[-\sum_{i\not=\ell} p(y_j\mid x_i)\,p(x_i)\,\left(2-p(y_j\mid x_\ell)\right) - p(y_j\mid x_\ell)p(x_\ell)\right]}{\left(\sum_k p(y_j\mid x_k)\,p(x_k)\right)^2}\\ \le&0. \end{align*} Therefore, $I$ is a concave function of $p(x)$ for fixed $p(y\mid x)$. Convexity is shown in a similar manner. If I have calculated the paritial derivative correctly, for fixed $k$ and $\ell$, we should have $$ \frac{\partial^2 I(X;Y)}{\partial p(y_\ell \mid x_k)^2}= \frac{p(x_k) \sum_{i\not=k} p(y_\ell \mid x_i) p(x_i)} {p(y_\ell \mid x_k) \sum_i p(y_\ell \mid x_i)p(x_i)} \ge0. $$

How to test the convexity of mutual information using leading principal minors?

There are 1 best solutions below

Related Questions in MATRICES

Related Questions in CONVEX-ANALYSIS

Related Questions in INFORMATION-THEORY

Trending Questions

Popular # Hahtags

Popular Questions