Let $X_1$ and $X_2$ be two independent and identically distributed random variables, with finite value space.
Is there a relation between the probability of them being equal and the entropy of their distribution. More precisely, how is $$ P(X_1 = X_2) = \sum_{x} p(x)\cdot p(x) $$ related to $$ H(X_1) = \sum_{x} p(x)\cdot \log{p(x)} $$ where $p(x) = P(X_1=x) = P(X_2 = x)$?
If there is no direct relationship, are there useful bounds? In particular, if we know the entropy, does that bound $P(X_1=X_2)$?
Interesting corner cases:
- Constant variable: $P(X_1 = X_2) = 1$, $H(X_1) = 0$.
- Uniform distribution in $n$ outcomes: $P(X_1 = X_2) = \frac{1}{n}$, $H(n) = \log n$
The more choices there are for values of $X$ the less $P(X_1=X_2)$ tells you about the entropy. So for example, for $2$ values, $P(X_1=X_2)$ determines $H(X)$:
Let $P(X=a) = p$. Then $$P\equiv P(X_1=X_2)= p^2+(1-p)^2\\ p=\frac{1+\sqrt{2P-1}}{2}\\ (1-p) = \frac{1-\sqrt{2P-1}}{2}\\ H(X) = \frac{1+\sqrt{2P-1}}{2}\log\left(\frac{1+\sqrt{2P-1}}{2} \right)+\frac{1-\sqrt{2P-1}}{2}\log\left(\frac{1-\sqrt{2P-1}}{2} \right) $$ which although a tad messy is a closed form unique expression for the entropy.
For three possible values, given $P\equiv P(X_1=X_2)$, you can actually find a useful interval (which depends on $P$) that the entropy must lie within.
For more than three possible values, the relationship is too weak to be useful.