Why use KL-divergence in any practical setting?

133 Views Asked by At

The KL-divergence, see https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence#Definition, is defined as the expectation of the logarithmic difference of the probabilities of two distributions.

I understand that this construct is theoretically interesting .... but, for practical purposes of comparing one distribution to another, is this really a useful metric?

First of all, it fails some very obvious criteria: it is asymmetric (KL-divergence of $P$ to $Q$ is not necessarily the same as the KL-divergence of $Q$ to $P$), it also does not satisfy the triangle inequality, but I think worse of all is the fact that it involves log-differences, which are unable to capture large deviations.

Basically my question is, if I have two distributions and I want to "compare" them, why should I ever use this construct? I would prefer something like MSE.