Bhattacharyya Coefficient vs. The overlap of two discrete probability distributions?

361 Views Asked by At

I am interested in measuring the overlap of two discrete probability distributions with given density functions $f$ and $g$ respectively.

It seems to me that this is equal to \begin{equation} \frac{1}{|\Omega|} \sum_{x\in \Omega} \min(f(x),g(x)). \end{equation}

However, when googling this problem, it seems that the standard measure for overlap is the Bhattacharyya Coefficient (BC), which is given by \begin{equation} \frac{1}{|\Omega|} \sum_{x\in \Omega} \sqrt{f(x)*g(x)}. \end{equation}

I understand that BC is an approximation (?), but why approximate something that is readily computable?

In particular, in the Monte Carlo estimation based on a set of $n$ samples $X_i$ drawn uniformly over $\Omega$, is there any reason to prefer \begin{equation} \frac{1}{n} \sum_{i = 1}^n \sqrt{f(X_i)*g(X_i)}. \end{equation} over \begin{equation} \frac{1}{n} \sum_{i = 1}^n \min(f(X_i),g(X_i)). \end{equation} ?

1

There are 1 best solutions below

3
On

I think there may be a slight misunderstanding. The Bhattacharyya distance is a 'measure' of the overlap. Not a 'measurement' of it. I mean that in plain english terms. It gives you a way of quantifying overlap, but it's not telling you the exact amount of area that overlaps. At least that's my reading of the various references.

I believe the John L is correct in his intepretation of what 'approximate' means in context.