Which similarity formula should I use?

94 Views Asked by Bumbble Comm At 10 May 2026 - 6:25

I was studying Cosine Similarity and I have just seen this article. https://medium.com/@rahulkuntala9/cosine-similarity-and-handling-categorical-variables-29f907951b5

The author uses Cosine Similarity in order to find the similarity between the p1 and the other vectors.

p1 = (1,0,0,150), newp1 = (1,0,0,100), newp2 = (1,0,0,200), newp3 = (0,0,1,135) and newp4 = (0,1,0,250)

Similarity(p1,newp1) = 0.999994

Similarity(p1,newp2) = 0.999998

Similarity(p1,newp3) = 0.99995

Similarity(p1,newp4) = 0.99994

My question is: Since I want the Cosine Similarity to be the weight to some values, how can I use these results in order to do that? All the similarities are almost 1 with no actual differences. I think that there is no reason to use these results. I have thought to use Euclidean Distance to find the similarity but I know that it is not the best method to find similarity.

What do you propose? Thank you!

Original Q&A

There are 1 best solutions below

Bumbble Comm On 28 Mar 2019 - 12:20

Cosine similarity essentially measures the angle between two vectors.

If you think geometrically you can see why all your values are close to $1$. Consider the two vectors $(1,0,100)$ and $(0,1,150)$ in three dimensions. Each sticks up nearly vertical from the $x-y$ plane so tha angle between them is very small.

To separate the vectors in your application you have to find another way to take into account the differences in the first three categorical variables. There is no off-the-shelf formula for that.

If you deal with the categorical variables separately then perhaps the Euclidean distance will be reasonable for the others. It will see the large differences in values in the fourth coordinate.

Your answer will have to depend on what the variables actually mean in your context.

Which similarity formula should I use?

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in MACHINE-LEARNING

Related Questions in DATA-ANALYSIS

Related Questions in DATA-MINING

Trending Questions

Popular # Hahtags

Popular Questions