Justification for infinite KL Divergence

4.3k Views Asked by Bumbble Comm At 07 Apr 2026 - 3:15

As I understand the KL Divergence, it measures how different two probability distributions $P$ and $Q$ are.

However, say the two distributions are:

P = [0.2 0.2 0.2 0.4];
Q = [0 0.5 0.1 0.4];

Then the KL-divergence is infinity. What is the justification for these distributions being infinitely different?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 09 Feb 2011 - 4:43 BEST ANSWER

One interpretation of the Kullback-Leibler divergence $D_{KL}(P \Vert Q)$ is that it measures the extra message length per character needed to send a message drawn from distribution $P$ using an encoding scheme optimized for distribution $Q$. If one character is assigned a very small probability by $Q$, then the encoding scheme will use a very large number of bits to encode that character, since it is expected to occur so rarely. (By Kraft's inequality, the number of bits should be $-\log_2(q_i)$, where $q_i$ is the probability of the unlikely character.) If in fact it occurs frequently in distribution $P$, then a message drawn from $P$ will take a large number of bits to encode (specifically, $-p_i \log_2(q_i)$ per character on average). Therefore, if the $i$-th component of $Q$ goes to zero while the corresponding component of $P$ remains fixed and positive, the KL divergence $D_{KL}(P \Vert Q)$ should (and does) diverge logarithmically as $-p_i \log_2(q_i)$.

Justification for infinite KL Divergence

There are 1 best solutions below

Related Questions in INFINITY

Related Questions in PROBABILITY-DISTRIBUTIONS

Trending Questions

Popular # Hahtags

Popular Questions