Csiszar-sum-identity Proof

365 Views Asked by At

I don't know how to prove Csiszár-sum-identity (aka Korner-Marton identity)

$$\sum_i^n I(Y^{i−1};X_i|X^n_{i+1},U)=\sum_i^n I(X^n_{i+1};Y_i|Y^{i−1},U)$$

I know that by applying the chain rule of mutual information we can see that both sides of the equation can be written as:

$$\sum_{1\le j \lt k \le n } I(Y_j;X_k | Y^{j-1} ; X^n_{k+1},U)$$

But, I only get there for the left side of the equality.

It would mean a lot.

Thanks.

1

There are 1 best solutions below

0
On BEST ANSWER

I think the trouble with showing this identity only arises due to indexing issues. Notice that in the left hand side of the equation, $i = 1$ gives $Y_1^0$, and $i = n$ in the right hand side gives $X_{n+1}^n$, both of which are usually treated as trivial random variables. This means that there is an indexing offset going on, and we should make sure to exploit this properly in the proof. In the following, terms like $Y_1^{-1}, Y_1^0, X_{n+1}^n, X_{n+2}^n, Y_0$ will show up, and are all treated as trivial random variables (this triviality can be formulated however you like, but the important thing is that for any trivial random variable $T$, and two other $U,V$, it holds that $I(T;U|V) = 0,$ and $I(U;V|T) = I(U;V).$)


First, by using the chain rule twice, we can observe that for any $i \in [1:n]$,

\begin{align}I(Y_1^{i-1};X_i|X_{i+1}^n, U) &= I(Y_1^{i-1}; X_{i}^n|U) - I(Y_1^{i-1};X_{i+1}^n|U) \\ &= I(Y_{i-1};X_i^n|U, Y_1^{i-2}) + I(Y_1^{i-2}; X_{i}^n|U) - I(Y_1^{i-1};X_{i+1}^n|U).\end{align}

Now observe that the following sum telescopes and so can be easily computed $$ \sum_{i = 1}^n I(Y_1^{i-2};X_i^n|U) - I(Y_1^{i-1}; X_{i+1}^n|U) = I(Y_1^{-1};X_i^n|U) - I(Y_1^{n-1};X_{n+1}^n|U) = 0, $$ where the final equality is because $Y_1^{-1}$ and $X_{n+1}^n$ are both trivial.

But then we conclude that \begin{align} \sum_{i= 1}^n I(Y_1^{i-1} ; X_i|X_{i+1}^n, U) &= \sum_{i = 1}^n I(Y_{i-1}; X_i^n|U, Y_1^{i-2})\\ &\overset{a}= \sum_{i = 0}^{n-1} I(Y_i;X_{i+1}^n|U, Y_1^{i-1}) \\ &\overset{b}= \mathfrak{J} +\sum_{i = 1}^n I(Y_i;X_{i+1}^n|U, Y_1^{i-1}), \end{align}

where $\mathfrak{J} = I(Y_0;X_1^n|U, Y_1^{-1}) - I(Y_n;X_{n+1}^n|U, Y_1^{n-1})$. Above equality $a$ comes from reindexing $i-1 \mapsto i$, and the equality $b$ comes from adding the appropriate $n$th term and removing the appropriate $0$th term from the sum, and $\mathfrak{J}$ is just there to account for this. But, of course, $\mathfrak{J} = 0$ since $Y_0, X_{n+1}^n$ are trivial, and we're done.


Actually the way you were indicating in the question is much simpler, although this also needs you to exploit the triviality of some indices. Since you got the left hand side, I'll work with the right hand side. \begin{align} \sum_{i = 1}^n I(X_{i+1}^n; Y_i|U, Y_1^{i-1}) &\overset{a}{=} \sum_{i = 1}^{n-1} I(X_{i+1}^n; Y_i|U, Y_1^{i-1}) \\ &\overset{b}= \sum_{i = 1}^{n-1} \left\{ \sum_{j = i+1}^n I(X_j;Y_i|U, Y_1^{i-1}, X_{j+1}^n)\right\},\end{align}

which is the expression in the question (after relabeling). Here $a$ is using the fact that $I(X_{n+1}^n; Y_n|U, Y_1^{n-1}) = 0$, and $b$ is the chain rule.