How would you quantify the closeness between sets.

296 Views Asked by At

How would you represent the closeness (distance?) between sets?
For example: how close are the sets: {8,4,5} and {9,8,2}? Could it be a percentage?

If there is no way to do this, would you need two sets?
How close is {8,4,5} to {9,8,2} compared to the closeness of {2,8,5} to {9,8,2}?


PS this is for a computer science project, yet a mathematical question.

Edit: I plan to have something like this: {a:1,b:5,c:6} and compare it to {a:2,b:18,c:24}. In this example, one might say they are fairly close due to the 'a' value being similar, however, {a:5, b:16, c:20} could be considered closer overall.
This is being used in a genre recognition program. We want to compare the features of songs (represented as values) by figuring out the values for an "average" song of a certain genre, and later compare other songs to the averages and see which one is closest (and hopefully be able to give a percentage of similarity).

1

There are 1 best solutions below

2
On

You're actually asking a deep question because there are many ways to do this.

What you described in your edit is actually comparing two tuples, which is different than comparing two sets. (Remember sets are unordered).

The distance in the Euclidean plane (on a sheet of paper) between $(2,8,5)$ and $(9,8,2)$ is $$\sqrt{ (2-9)^2 \ + \ (8-8)^2 \ + \ (5-2)^2 }$$. But for example you could change all the $^2$'s to $^9$'s and the $\sqrt{}$ to a $\sqrt[9]{}$. What would that do?

The actual distance between music genres is an unsolved problem and the answer will depend on information from real life (i.e., outside of pure mathematics) as well as on asking mathematical questions (i.e., thinking more precisely than you currently are—which is not an insult, I would start out asking the question the same way).

(For example what does "average" mean? Here are some pictures to get you thinking about why it might not be as simple as $\mathtt{average}(3,5,7)=5$.

radar chart 1
(source: telco2.net)

radar chart 2

radar chart 3
(source: plosone.org)

Have fun!