Function of Determinant convexity via information theoretic methods

130 Views Asked by At

In Cover and Thomas, "Elements of Information Theory", second edition, I encountered this question:

Let $K$ and $K_0$ be symmetric positive definite matrices of same size (s.p.d.m). Then show that (det is determinant here) $$f(K)=\log\left(\frac{\det(K+K_0)}{\det{(K)}}\right)$$ is a convex function of $K$.

As a side note, one of the best things about information theory is that it can be used to prove unusual determinant identities and inequalities such as Hadamard's inequality and so on. I am only looking for an "information theoretic" proof and not any other proofs. As an example,

Prove that if $K_1$ and $K_2$ are s.p.d.m, then $$\det (K_1 + K_2) \geq \max \{\det(K_1),\det(K_2)\}$$ Proof:

Let $X \sim N(0,K_1)$ and $Y \sim N(0,K_2)$ independent of each other. Then we have that (here $n$ is the dimension of the matrix and $h(.)$ is the differential entropy function) $$h(X+Y) \geq h(X+Y|X) = h(Y)$$ Hence, we have $$\frac{1}{2}\log((2\pi e)^n\det(K_1+K_2)) \geq \frac{1}{2}\log((2\pi e)^n\det(K_2))$$ The result will follow from this.

Here is my attempt at the original problem:

We need to show for $\lambda \in (0,1)$ $$\log\left(\frac{\det(\lambda K_1 + (1-\lambda)K_2 +K_0 )}{\det(\lambda K_1 + (1-\lambda)K_2)}\right) \leq \lambda \log\left(\frac{\det(K_1+K_0)}{\det{(K_1)}}\right) + (1-\lambda)\log\left(\frac{\det(K_2+K_0)}{\det{(K_2)}}\right)$$ Let $X_1 \sim N(0,K_1)$, $X_2 \sim N(0,K_2)$ and $Y\sim N(0,K_0)$ all mutually independent. Let $v$ be a random variable which is $1$ w.p. $\lambda$ and $2$ w.p. $1-\lambda$. Let $K_v =\lambda K_1 + (1-\lambda)K_2$. Note that $E[X_vX_v^T]= K_v$. Let $\widetilde{X}\sim N(0,K_v)$. Note that $X_v$ is not necessarily multivariate gaussian. Hence it boils down to showing $$h(\widetilde{X}+Y) - h(\widetilde{X}) \leq h(X_v+Y|v) - h(X_v|v)$$

I am not sure if the inequality above is correct or not. It's just that it would imply the result. Anyhow I was able to get (since $X_v$ and $\widetilde{X}$ have the same covariance matrix) $$h(\widetilde{X}) \geq h(X_v) \geq h(X_v|v)$$

But I couldn't get a similar chain for $h(\widetilde{X}+Y)$. I tried to use entropy power inequality for this but it didn't seem to work. My guess is that this approach won't work and I should try to manipulate mutual information terms or something.

I appreciate any ideas on this. As always, if something is unclear let me know.

1

There are 1 best solutions below

0
On BEST ANSWER

I found the solution I was looking for in a 2001 paper by Suhas Diggavi and Tom Cover. Didn't realize it was that hard that it warranted a transactions paper. After all, Tom Cover gave it as an exercise problem!!

Update: I have found another way. Kindly check if my proof is correct.

Consider 2 channels $$Y_1 = X + Z_1$$ $$Y_2 = X + Z_2$$ where $X\sim \mathcal{N}(0,K_0)$, $Z_1\sim \mathcal{N}(0,K_1)$ and $Z_2 \sim\mathcal{N}(0,K_2)$. Also $X$, $Z_1$ and $Z_2$ are mutually independent random vectors. Let $U\in \{1,2\}$ be a random variable independent of all the above random variables and such that $Pr(U=1)=\lambda$ and $Pr(U=2)=1-\lambda$. Consider the mixed channel $$Y_U = X+Z_U.$$ If $W_1(y|x)$ and $W_2(y|x)$ are the channel transition probabilities in the first two cases then the channel in the mixed case is $$W(y|x) = \lambda W_1(y|x) + (1-\lambda)W_2(y|x).$$ Since the distribution of $X$ is fixed, from basic information theory, the mutual information between input and output is convex as a function of channel transition probability provided the input distribution is fixed. Hence $$I(X;Y_U) \leq \lambda I(X;Y_1) + (1-\lambda)I(X;Y_2)$$ If you evaluate RHS, you get \begin{align} RHS &= \lambda \left\{\frac{1}{2}\log\left((2\pi e)^n\det(K_1 + K_0)\right)-\frac{1}{2}\log\left((2\pi e)^n\det(K_1)\right) \right\} \\ &+ (1-\lambda)\left\{ \frac{1}{2}\log\left((2\pi e)^n\det(K_2 + K_0)\right)- \frac{1}{2}\log\left((2\pi e)^n\det(K_2)\right)\right\} \\ &=\lambda \left\{\frac{1}{2}\log\left(\frac{\det(K_1 + K_0)}{\det(K_1)}\right)\right\} +(1-\lambda) \left\{\frac{1}{2}\log\left(\frac{\det(K_2 + K_0)}{\det(K_2)}\right)\right\} \end{align} which is the required RHS except for a factor of $2$ which will be cancelled soon. Now consider LHS. We need a suitable lower bound for it. $$I(X;Y_U) = h(X+Z_U) - h(Z_U)$$ Note that $Z_U$ is not multivariate Gaussian. But it has variance $\lambda K_1 + (1-\lambda)K_2$. Now by entropy power inequality (EPI), we have $$2^{\frac{2h(X+Z_U)}{n}}\geq 2^{\frac{2h(X)}{n}} + 2^{\frac{2h(Z_U)}{n}}.$$ This implies $$2^{\frac{2(h(X+Z_U)-h(Z_U))}{n}}\geq 1+ 2^{\frac{2(h(X)-h(Z_U))}{n}} \geq 1+ 2^{\frac{2(h(X)-h(Z))}{n}}.$$ where $Z$ is multivariate normal with covariance matrix $\lambda K_1 + (1-\lambda )K_2$, and we used that the Gaussian distribution maximizes entropy under common variance. Now we also have $$1+ 2^{\frac{2(h(X)-h(Z))}{n}} = 2^{\frac{2(h(X+Z)-h(Z))}{n}}$$ since Gaussian distributions achieve equality in EPI. Hence we get \begin{align} h(X+Z_U)-h(Z_U) &\geq h(X+Z)-h(Z) \\ &=\frac{1}{2}\log\left((2\pi e)^n \det(\lambda K_1 + (1-\lambda )K_2+K_0)\right)\\ &\quad-\frac{1}{2}\log\left((2\pi e)^n \det(\lambda K_1 + (1-\lambda )K_2)\right)\\ &=\frac{1}{2}\log\left(\frac{\det(\lambda K_1 + (1-\lambda )K_2+K_0)}{\det(\lambda K_1 + (1-\lambda )K_2)}\right) \end{align} which gives us the desired result. It's quite likely this was the solution the authors expected, assuming the steps are correct.