In the available resources on the internet, it is said that KL divergence measures the similarity between two distributions. But if you use the TensorFlow built-in method to compute the KL divergence between two similar Normal distributions, it gives you a large number. why is that?
I used the formula below:
kl = tf.keras.losses.KLDivergence() # Define KL divergence
tfd = tfp.distributions
dist = tfd.Normal(loc=0., scale=1.) # Build a Normal distribution
kl(dist.sample([300000]), dist.sample([300000])) # Compute the KL between two identical distribution
output:
<tf.Tensor: shape=(), dtype=float32, numpy=772134.5>