A measure of similarity of real vectors independent of their dimension

194 Views Asked by At

I am trying to find a measure of similarity between two vectors that works for any pair of vectors v, w $\in R^n$ (for any n).

for example:

v1=(1,2,4) v2=(-2,4,4) -> $sim(v1,v2) \in R$

v1'=(0,0,2,0,3) v2'=(2,4,6,1,2) -> $sim(v1',v2') \in R$

I want to be able to compare the results sim(v1,v2) and sim(v1',v2); so that for any pair (v1,v2) and (v1',v2'); I can tell which pair is more "similar".

Obviously I tried using the standard norm of the euclidean distance. But I found that the result is not actually working when you compare a distance in $R^2$ and a distance in $R^5$. It penalyses less the component-wise distances as the dimension grows (see example below)

I am wondering if there is any alternative.

** clarification on why I don't like the standard norm of the euc distance **

PAIR 1) v1 = (0) , v2=(1) ---> |v1-v2| = 1

PAIR 2) v1' = (0,0) , v2'=(1,1) ---> |v1'-v2'| = $sqrt(2)$ = 1.41

PAIR 3) v1''= (0,0,0), v2''=(1,1,1) ---> |v1''-v2''| = $sqrt(3)$ = 1.73

Which pair is more "alike"? I am not sure if the norm of the euclidean distance is an appropiate metric... I think that they are all as different as two vectors in its respective spaces can be. I think that the norm of the euclidean distance is not "scaled" properly.

Any ideas on how to compare?

1

There are 1 best solutions below

0
On

A standard sort-of solution is the "cosine similarity" (although this is usually defined for unit vectors): You compute the angle between the two vectors, thus: $$ d(v_1, v_2) = \cos^{-1} \frac{v_1 \cdot v_2}{\|v_1\|\|v_2\|} $$

If $v_1, v_2$ are unit vectors, then you can skip dividing by the lengths, of course. The downside? If $v_1, v_2$ point in the same direction, but have different lengths, this "distance" still returns the value $0$.

The upside? If $v_1, v_2 \in \Bbb R^2 \subset \Bbb R^5$, and you compute the distance, you get the same answer whether your think of them as being in $\Bbb R^2$ or in $\Bbb R^5$.