What is the correct way to compute a Kullback–Leibler metric?

39 Views Asked by At

If you go to the Wikipedia page for Kullback-Leibler and scroll down to the definitions section you will see:

$D_{KL}(p||q)=\int_{-\infty }^{\infty}p(x)log(\frac{p(x)}{q(x)})$

for two arbitrary probability density functions $p(x) $ and $q(x)$.

If you go to the Wikipedia page for Normal distribution and go to the bottom of the column on the right you will see somebody has done the hard work of plugging in the PDF for a univariate Gaussian distribution for the specific case when $p(x)$ is $N(\mu_0,\sigma_0)$ and $q(x)$ is $N(\mu_1,\sigma_1)$. The Kullback-Leibler distance measure is then given to be:

$D_{KL}(N_0,N_1)=\frac{1}{2}( \frac{\sigma_0}{\sigma_1})^2+ \frac{(\mu_1-\mu_0)^2 }{\sigma_1^2}-1+2ln(\frac{\sigma_1}{\sigma_0}) )$

I tried to code both of these into MATLAB and run a simple test with $\mu_0=1, \sigma_0=1$ and $\mu_1=3, \sigma_1=1$.

However, for some reason both of these Kullback-Leibler functions give me different values. So I am wondering if there is something wrong in my code or are these two KL distances (one from the Normal Distribution wiki page and the other from the Kullback-Leibler wiki page) are actually supposed to be different?

Here is my MATLAB code:

function [ distance ] = KL(mu1,sigma1,mu2, sigma2 )
% define the two guassian PDFs with their corresponding mu and sigma values
fun1 = @(x) (1/( sigma1*sqrt(2*pi)  )  )*exp(1/2* (-(x-mu1).^2)/(sigma1^2));
fun2 = @(x) (1/( sigma2*sqrt(2*pi)  )  )*exp(1/2* (-(x-mu2).^2)/(sigma2^2));

% implement the integrand of the KL integral from the "Kullback-Leibler"
% wiki page
fun3 = @(x) fun1(x)*log(fun1(x)/fun2(x))
fun4 = @(x) fun2(x)*log(fun2(x)/fun1(x))

y=inf; % integrate from - infinity to infinity
distance = integral(fun3, -y,y) + integral(fun4, -y,y);

% implement the KL defined in the "Normal Distribution" wiki page
dist2 = (sigma1/sigma2)^2 + (mu1-mu2)^2/(sigma2^2)-1+2*log(sigma2/sigma1);

dist4 = 1/2 * (dist2)

% It should be that dist4 = distance (but it is not the case. Why?

Here is the output for a simple example with $\mu_0=1, \sigma_0=1$ and $\mu_1=3, \sigma_1=1$.

>> KL(1,1,3,1)

dist4 =

     2


distance =

    0.1080

Shouldn't they both be the same? If not, which one should I use?