Is it wrong to use Binary Vector data in Cosine Similarity?

7k Views Asked by Bumbble Comm At 07 Apr 2026 - 5:45

I am doing Information Retrieval using Cosine Similarity.

My data is a binary vector.

Since most of the references I read were using non-binary vector (non-binary matrix) data, I am wondering if it is wrong to use binary vector data in the cosine similarity function.

Original Q&A

There are 2 best solutions below

user12998 On 27 Aug 2011 - 2:39

Using binary vector data works perfectly for doing cosine similarity studies. Actually, it makes the arithmetic much simpler because the magnitude of each vector is simply equal to the squareroot of the sum of its entries.

Bumbble Comm On 27 Jan 2012 - 9:32

Consider looking at the Jaccard coefficient and Tanimoto coefficient. These two are probably a bit more sensible for binary data.

You can obviously use cosine distance, but computing it this way makes things overcomplicated when you have binary data. The dot product boils down to computing the size of the intersection set, the vector length are the number of bits set. Realizing that you are just looking at set sizes leads to more straightforward and fast ways of computing these things.

Is it wrong to use Binary Vector data in Cosine Similarity?

There are 2 best solutions below

Related Questions in STATISTICS

Related Questions in VECTOR-SPACES

Related Questions in INFORMATION-THEORY

Trending Questions

Popular # Hahtags

Popular Questions