Convexity of $I(X;Y)$: why $H(Y)$ convex in $p(y)$ $\Rightarrow$ $H(Y)$ convex in $p(x)$

263 Views Asked by At

I would like to understand the proof that mutual information $I(X;Y)$ is concave in $p(x)$ - as presented in Elements of Information Theory by Cover & Thomas, theorem 2.7.4.

Here's the proof from the textbook:

enter image description here

I understand that $p(y) = \sum_x p(x,y) = \sum_x \color{blue}{p(x)} \, p(y \mid x)$, i.e. $p(y)$ is a linear combination of $p(x)$. But why from the fact that $H(Y)$ concave in $p(y)$ it follows that it's concave in $p(x)$?

And also, why fixing $p(y \mid x)$ is important? It's never negative anyway - it's a probability distribution.

1

There are 1 best solutions below

2
On BEST ANSWER

In general, if $f:\mathbb{R}^n\to \mathbb{R}$ is concave, then for any matrix $A \in\mathbb{R}^{n\times n}$, $f(Ax)$ is also concave in $x$, and the proof uses this fact, i.e., the fact that the concave function of a linear combination is still a concave function.

Proof: For any $0\leq \lambda\leq 1$, and any $x_1,x_2 \in \mathbb{R}^n$, $$ f\left(A\left( \lambda x_1 + (1-\lambda)x_2\right)\right) = f\left(\lambda Ax_1 + (1-\lambda)Ax_2\right) \geq \lambda f(Ax_1) + (1-\lambda)f(Ax_2) , $$ where the inequality follows from the fact that $f$ is concave, and by the definition of concavity. $\square$

Again, in general, it is possible for a function $g: \mathbb{R}^n\to \mathbb{R}$ to be concave in a subset of its arguments when the remaining arguments are fixed, but still not be concave in the entire set of arguments, e.g., a function with two inputs, $g(x,y)$, can be concave in $x$ for every fixed $y$, but not concave in the pair $(x,y)$. In fact, this is exactly the case for mutual information: For a fixed $p(y|x)$, it is concave in $p(x)$; for a fixed $p(x)$, it is convex in $p(y|x)$; and it is neither convex nor concave in the pair $\left( p(x), p(y|x)\right)$.