The unusual notation $D(P||Q)$ seems to be universally used for statistical divergences (e.g. KL divergence). What is the origin of this notation, and do the double bars (pipe symbols) have any significance in statistics/probability or information theory?
2026-04-05 17:13:22.1775409202
On
Origin of the notation for statistical divergence
2k Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
2
There are 2 best solutions below
2
On
Double bars are not universally acknowledged notation for statistical distances. Instead of double bars, coma can be preferred. Even in information theory society, double bars are not a must to be used to indicate the distance between the probability measures. However, it is quite likely that such creative notations might have come to existence by some exotic information theory guys.
Kullback and Leibler did not originate the $D(P||Q)$ notation. In their paper "On Information and Sufficiency", Ann. Math. Stat, 22(1):79-86, 1951, they use $$I_{1:2}(E)=\frac{1}{\mu_1(E)}\int_{E} \,d\mu_1(x) \log \frac{f_1(x)}{f_2(x)},$$ stated for a set $E\subseteq S$ of the sample space $S.$ They attribute this notation to Halmos and Savage.
Shannon doesn't seem to use it either, as far as I can tell by a cursory look. Maybe an information theorist (Cover? Wolfowitz(?), Gallager(?, but in his classic book it only appears as a problem, for the discrete case, and without a symbol, just as a sum!), Wyner(?),Csiszar?) later on adopted the notation.
The two vertical bars may be there to stop people think it is a conditional distribution.