Well known formula of KL divergence when we have a discrete probability distributions.
$$D_{KL}(P \parallel Q)=\sum\limits_i \ln \left(\frac{P(i)}{Q(i)}\right) P(i)$$
Can someone explain why the natural base of the logarithm? That will probably not yield the information in bits as a result?
Do I need to change the base of the logarithm to 2 in order to get the relative entropy in bits?
Or there is another way?
Thank you.
M.
$$ \log_2 x = \frac{\log_e x}{\log_e 2} = \frac{\ln x}{\ln 2} $$
Expressing entropy in bits means using base-$2$ logarithms. Just divide the base-$e$ logarithms by $\ln 2$ and you've got it.