Calculating similarity of sets of variable length

50 Views Asked by At

If I have two sets, i can calculate similarity coefficient of them using Jaccard index. Is there algorithm i can calculate similarity with variable number of entities? For example, let's say we have first pair of sets:

{A1,B1,C1} and {A1,B2,C1,D1}

{A1,B1,C1} and {A1,B3,C4,D5}

I can say that first pair is more similar, but how to calculate it mathematically?

1

There are 1 best solutions below

1
On BEST ANSWER

The Jaccard index $\displaystyle J(A,B) = {{|A \cap B|}\over{|A \cup B|}} = {{|A \cap B|}\over{|A| + |B| - |A \cap B|}}$ seems to handle differently sized sets as part of its definition

In your examples it would give $\dfrac25$ and $\dfrac16$ and the first value is certainly higher than the second