Similarity of A Binary Vector

129 Views Asked by At

I have a binary vector of [1,0,1,0,0,0] and I wish to obtain a value between one and zero that indicates its similarity with the vector [1,0,0,0,0,1] as apposed to its similarity with [0,0,0,0,0,0]. At first, I used a percentage similarity. I determined how many of the elements were identical to the original elements and obtained a percentage. If the two vectors are are the exact same, this value is 1.0. However, this doesn't work for vectors with over 300 elements. Due to the nature of my data, many of the vector values are 0, and thus I obtain strong values of .95 by chance when there should not be strong values. Does anyone have a more sensitive method of determining the similarities of two vectors?

Thanks

1

There are 1 best solutions below

0
On

A simple minded way is to scale up the difference from $1$. If your vectors are mostly $0$s with a few $1$s, you are really counting the total number of $1$s in the two vectors (as there will be few cases where both vectors are $1$). So count the differences, multiply by $10$ (or some convenient factor) and divide by the length. Then if the value is greater than $1$, cap the difference at $1$. You lose the information for vectors that are very different, but you don't have any of those anyway.