Conditional Kullback Divergence

71 Views Asked by At

Let X be a discrete random variable drawn according to probability mass function $p(x)$ over alphabet $X$ , and let random variables $Y_1$ and $Y_2$ take value in alphabet $Y$ with probability $p_1(y)$ and $p_2(y)$, respectively. The divergence and conditional divergence in this notation are: enter image description here

Can conditioning never reduce or never increase the divergence, or none of the above?

I have information that conditioning increases KL divergence, but have no clue how to show this information. Also, I am not sure if I can say $p(x)p_1(y|x)= p_1(x,y)$.

Any hints, or tips?

1

There are 1 best solutions below

4
On

I prefer the notation $D(P_{Y|X}\|Q_{Y|X}|P_X),$ since this makes the law over $X$ explicit.

For a pair of laws $P_{XY},Q_{XY},$ the chain rule for KL divergence is $$ D(P_{XY}\|Q_{XY}) = D(P_X\|Q_X) + D(P_{Y|X}\|Q_{Y|X}|P_X).$$

Now, if $P_X = Q_X$ as in the question, then the first term is $0$. But exchanging the role of $X$ and $Y$, we can also write $$ D(P_{XY}\|Q_{XY}) = D(P_Y\|Q_Y) + D(P_{X|Y}\|Q_{X|Y}|P_Y), $$ and the final term here must be nonnegative (why?). We can thus infer that $$ D(P_Y\|Q_Y) \le D(P_{XY}\|Q_{XY}) = D(P_{Y|X}\|Q_{Y|X}|P_X).$$