Similarity between two lists

680 Views Asked by At

I am working with psychology, more precisely, the Big Five personality traits. I have a test which measures 5 variables, that is believed to describe personality traits of human beings. The variables are; openness, conscientiousness, extraversion, agreeableness & neuroticism. I will refer to them as O C E A N.

A test is carried out to measure the variables. It can yield scores from 1.0 to 7.0. Where E = 7.0 would mean that the person has the maximum value of extraversion, suggesting a very social person.

Lets say both Eve and Adam makes the test, and their final scores are:

Eve = [4.0, 2.4, 5.2, 5.1, 6.9]

Adam = [1.1, 2.2, 4.2, 5.6, 6.1]

What would be a good way to measure their similarity? The ordering of the letters does not matter, as long as they both use the same order, e.g instead of O C E A N it could be N E O A C. There might be correlation between the variables, e.g openness O might have a positive correlation with extroversion E.

2

There are 2 best solutions below

2
On

There are several approaches you can take

Cosine similarity

Define $|{\rm Eve}| = \sqrt{O_{\rm eve}^2 + C_{\rm eve}^2 + E_{\rm eve}^2 + A_{\rm eve}^2 + N_{\rm eve}^2}$, with a similar expression for Adam and

$$ {\rm Eve}\cdot {\rm Adam} = O_{\rm eve}O_{\rm adam} + C_{\rm eve}C_{\rm adam} + \cdots + N_{\rm eve}N_{\rm adam} $$

The cosine similarity between these two observations is

$$ \text{cosine similarity} = \frac{{\rm Eve}\cdot {\rm Adam}}{|{\rm Eve}||\rm Adam|} $$

You can then calculate the angular distance

$$ \text{angular distance} = \frac{1}{\pi}\arccos(\text{cosine similarity}) $$

and the angular similarity

$$ \text{angular similarity} = 1 - \text{cosine distance} $$

In this scenario, a similarity of $1$ means observations are close, $0$ means they are different

5
On

You are asking about the distance in vector space. There exist many possible distance functions, but most common are ($x_{1\dots5}$ and $y_{1\dots5}$ are 5 your numbers in order):

  1. Euclidian distance (from $L_2$ norm) $d_E = \sqrt{(x_1-y_1)^2 +\dots + (x_5-y_5)^2}$
  2. Manhattan distance (from $L_1$ norm) $d_M = |x_1-y_1|+\dots+|x_5-y_5|$
  3. Chebyshev distance (from $L_\infty$ norm) $d_C=\max(|x_1-y_1|,\dots |x_5-y_5|)$

You are probably good with the first one.