Relative Entropy given two non-equivalent sets

423 Views Asked by At

I am trying to calculate the relative entropy given two collections and have a question regarding some issues.

Supposed we have two sets, $Real$ and $Calculated$, and their respective probability mass functions, $P$, and $Q$.

Relative Entropy, or Kullback Leibler Divergence is defined as the following:

$$\sum_{i=0}^{n} P(i)\log \frac{P(i)}{Q(i)}$$

How do we properly handle situations where $|Q| \neq |P|$?

Should we take the intersection of the sets, $Real$ and $Calculated$, and scale their respective probability mass functions to correct the calculation of the relative entropy? Otherwise only calculating over the intersection without scaling the probabilities can lead to negative results, which is not correct.

I am using the following code to calculate R.E.

def kullback_leibler_divergence(real, predicted):

    sum = 0.0
    for qs, freq in predicted.items():
        freq_r = real.get(qs, 0.0)
        operand = m.log(freq_r / freq) if freq_r != 0 and freq != 0 else 0
        sum += freq_r * operand

    return sum

...but I get negative results occasionally, which made me question how I am handling my input parameters.

1

There are 1 best solutions below

0
On BEST ANSWER

The question is addressed here (see also @cardinal's answer on the linked page).