Is it safe to say 'two distributions are 70% similar' if their total variation distance is 0.3?

Question

Is it safe to say 'two distributions are 70% similar' if their total variation distance is 0.3?

210 Views Asked by Bumbble Comm At 02 Apr 2026 - 5:36

I'm writing an engineering paper and looking for a metric to represent the similarity of two discrete probability distributions. As far as I know, there are lots of such metrics or distances, but I found out that the total variation distance is the most simple.

$Dist = \Sigma_{x} (\frac{1}{2} |f(x) - g(x)|)$

As the target venue has nothing to do with mathematics, I want to keep it as straightforward as possible. For example, saying that the distance is 0.3 is maybe neither unfamiliar nor straightforward to understand for the readers.

But I noticed that the total variation distance is always between 0 and 1. So I was tempted to say that the similarity is 70% (= 1 - 0.3), rather than the distance is 0.3. Basically, it is none other than defining the concept of similarity as 1 - distance, but I wonder whether or not this approach is safe. I hope this is common but I expect it isn't.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Since the total variation distance is a statistical distance it's probably more important to explain to the layperson if it's a difference between two random variables, two probability distributions or samples, or the distance between an individual sample point and a population or a wider sample of points rather than to simplify it as "the similarity is 70%". The mean and symmetry are also important, but that's a different point (hidden by using tvd).

$D_{TV} = \Sigma_{x} (\frac{1}{2} |f(x) - g(x)|)$

The total variation distance is the largest possible difference between the probabilities that the two probability distributions can assign to the same event, so "the similarity can be greater than 70%, while the difference can be as great as 30%".

When trying to express a distance correlation to a layperson simply saying "it's 70% similar" is 'selling' the similarity when there may not be as much similarity as was hoped for. See this example of the correlation of x and y for various distributions from Wikipedia's distance correlation webpage:

Statistics is a complex subject, the presentation can skew ones beliefs; as you can see the "similarity" needs to be fairly great for one to be satisfied of a similarity, when the number is that low I would prefer distance over similarity.

Is it safe to say 'two distributions are 70% similar' if their total variation distance is 0.3?

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICS

Trending Questions

Popular # Hahtags

Popular Questions