How are P(X₁=X₂) and H(X) related?

623 Views Asked by At

Let $X_1$ and $X_2$ be two independent and identically distributed random variables, with finite value space.

Is there a relation between the probability of them being equal and the entropy of their distribution. More precisely, how is $$ P(X_1 = X_2) = \sum_{x} p(x)\cdot p(x) $$ related to $$ H(X_1) = \sum_{x} p(x)\cdot \log{p(x)} $$ where $p(x) = P(X_1=x) = P(X_2 = x)$?

If there is no direct relationship, are there useful bounds? In particular, if we know the entropy, does that bound $P(X_1=X_2)$?

Interesting corner cases:

  • Constant variable: $P(X_1 = X_2) = 1$, $H(X_1) = 0$.
  • Uniform distribution in $n$ outcomes: $P(X_1 = X_2) = \frac{1}{n}$, $H(n) = \log n$
2

There are 2 best solutions below

1
On BEST ANSWER

The more choices there are for values of $X$ the less $P(X_1=X_2)$ tells you about the entropy. So for example, for $2$ values, $P(X_1=X_2)$ determines $H(X)$:

Let $P(X=a) = p$. Then $$P\equiv P(X_1=X_2)= p^2+(1-p)^2\\ p=\frac{1+\sqrt{2P-1}}{2}\\ (1-p) = \frac{1-\sqrt{2P-1}}{2}\\ H(X) = \frac{1+\sqrt{2P-1}}{2}\log\left(\frac{1+\sqrt{2P-1}}{2} \right)+\frac{1-\sqrt{2P-1}}{2}\log\left(\frac{1-\sqrt{2P-1}}{2} \right) $$ which although a tad messy is a closed form unique expression for the entropy.

For three possible values, given $P\equiv P(X_1=X_2)$, you can actually find a useful interval (which depends on $P$) that the entropy must lie within.

For more than three possible values, the relationship is too weak to be useful.

0
On

The Rényi entropy of order 2 is $H_2(X) = -\log \sum{p(x)}^2$, and it is known that is lower than the Shannon entropy $H(X)$. This yields the inequality $P(X_1=X_2) \geq \exp(-H(X))$.