Distance between two p.m.fs

73 Views Asked by At

I am stuck with the following problem from research. Is there any existing distance measure which can compare two probability mass functions with different support? For eg. for pmfs $p_1$ and $p_2$ such that $p_1 = [a_1, a_2, a_3, a_4]$ and $p_2=[b_1, b_2]$, is there any distance metric $D(p_1||p_2)?$

1

There are 1 best solutions below

0
On

Here's a reasonable notion. Let $(X,p_X)$ and $(Y,p_Y)$ be finite probability spaces; we can assume $|X|=|Y|$ by adding elements of probability zero to either space. Then set $$D(p_X,p_Y)=\min_{\pi:X\to Y} \sum_{x\in X} |p_X(x)-p_Y(\pi(x))|$$ where the min is taken over all bijections from $X$ to $Y$. (You need to check that this is well-defined -- if we add an arbitrary number of null elements to both $X$ and $Y$, we still get the same value of $D$ (since the optimal $\pi$ pairs up extra nulls with extra nulls, the extras don't contribute to the sum.) You could define it by only adding nulls to the smaller set, but this formulation is more useful for verifying that the triangle inequality holds.)

It's easy to see that $D$ is symmetric and satisfies the triangle inequality. Furthermore, $D(p_X,p_Y)=0$ iff there is a probability-preserving bijection between the non-null elements of $X$ and $Y$; thus $D$ "doesn't see" null elements.

As an example, if $p_X$ is a distribution with probabilities $(1/2,1/3,1/6)$ and $p_Y$ has probabilities $(1/4,3/4)$, then $$D(p_X,p_Y)=|3/4-1/2| + |1/4-1/3| + |1/6-0| = 1/4 + 1/12 + 1/6 = 1/2.$$