Prove that the Jaccard Similarity of bags is always less than or equal to 1/2

1.9k Views Asked by At

Prove that the Jaccard Similarity of bags is always less than or equal to 1/2

I know that the Jaccard Similarity of bags is $= \dfrac{|B\cap C|}{|B\cup C|}$ and if the bags are the same then the similarity $=1$

A bag is a set of elements with order unimportant and repetition allowed.

For example if we have bags $B=\{A,B,B,C,C,C,C,D,D\}$ and $C=\{A,A,B,C,C,D,D,D\}$ then the Jaccard Similarity is $=\dfrac{1+1+2+2}{9+8}=\dfrac{6}{17}$

So we count an element n times in the intersection if n is the minimum number of times the element appears in B and C and the size of the union is the sum of the size of the bags.

But how would I prove it is always $\leq \dfrac{1}{2}$

2

There are 2 best solutions below

2
On BEST ANSWER

Without loss of generality we can consider bags with only one kind of element. Let $b=|B|, c=|C|$. Then the formula gives $\min(b,c)/(b+c)$ but then $2\min(b,c)\leq(b+c)$. Note that the formula gives $1/2$ if the bags are equal.

0
On

Every Jaccard Similarity is going to look like

$$\dfrac{n_1 + n_2 + n_3 + \cdots + n_N}{d_1 + d_2 + d_3 + \cdots + d_N}$$ where $2n_i \le d_i$ for each $i$.

So $$d_1 + d_2 + d_3 + \cdots + d_N \ge 2(n_1 + n_2 + n_3 + \cdots + n_N)$$.

Hence $$\dfrac{n_1 + n_2 + n_3 + \cdots + n_N}{d_1 + d_2 + d_3 + \cdots + d_N} \le \dfrac 12$$