Notion of distance algorithm behavior

43 Views Asked by At

I have currently implemented the solution of Hellinger Distance in Node.js and Python.

The Hellinger Distance is being calculated over two histograms p and q. They are normalized discrete probability distributions.

However, consider this:

p = [0, 0, 0, i, 0, 0, 0, 0, 0, 0]
q0 = [0, 0, 0, 0, j, 0, 0, 0, 0, 0]

h(p, q0) = x0

Furthermore:

p = [0, 0, 0, i, 0, 0, 0, 0, 0, 0]
q1 = [0, 0, 0, 0, 0, 0, 0, 0, 0, j]

h(p, q1) = x1

Here I would prefer that the distance value of x1 should be greater than x0. But when I test this I get the exact same distance.

Is there any other algorithm I could implement that could solve this issue together with the Hellinger Distance?

Real example with HD:

p = [50.00, 50.00, 94.00, 50.00, 50.00, 50.00, 50.00, 50.00, 50.00, 50.00]
q0 = [50.00, 50.00, 50.00, 94.00, 50.00, 50.00, 50.00, 50.00, 50.00, 50.00]

h(p, q0) = 0.19724284169529102


p = [50.00, 50.00, 94.00, 50.00, 50.00, 50.00, 50.00, 50.00, 50.00, 50.00]
q1 = [50.00, 50.00, 50.00, 50.00, 50.00, 50.00, 50.00, 50.00, 50.00, 94.00]

h(p, q1) = 0.19724284169529102

The implementation is done in a straight forward way following the function for hellinger distance:

$$h(p, q) = \frac{1}{\sqrt2}\sqrt{\sum_{i=1}^k(\sqrt{p_i} - \sqrt{q_i})²}$$

Note that both p and q are normalized before being sent as argument to the hellinger distance function.

As far as I understand, this is a correct output from the algorithm.

Right now I am investigating the earth mover algorithm.

But I am happy to take any answer that could lead me to a suitable algorithm.

Thanks