Best possible approximation of P(X,Y)

35 Views Asked by At

I want to show that $\min_Q D(P_{Y|X} || Q | P_x) = I(X;Y)$ and I've arrived at a question. Here $D$ is the KL divergence or relative entropy and $I$ is of course the mutual information.

Algebra-wrestled with it a bit and found I want to find Q such that the following is minimized:

$\sum_{x,y}P(x,y)\log \frac{P(x,y)}{P(x)Q(y)}$

Now: this is always positive I think, so I'd prefer to set $Q = P(y|x)$ to make the whole thing zero. But I can't condition on $x$ so I'd like to approximate $P(x,y)$ as closely as possible. Is this always done with $P(x)P(y)$, given no information on correlation between $X$ and $Y$? It makes intuitive sense, but I'd prefer a slightly more convincing argument.

1

There are 1 best solutions below

5
On BEST ANSWER

If you want to minimise on $Q$, why don't you simply expand the term in the log and minimize the part that depends on $Q(y)$ with the constraint $\sum Q(y) = 1$.

Result should be more convincing that playing with intuitive arguments (which may be true as well as misleading).