Intuition behind dot product

57 Views Asked by At

How exactly does a dot product give the similarity between two vectors ? I have just been doing the calculation without every actually giving it any thought. For example if Vector A is <0.5,2,1> Vector B is <0.5,0.3,0.5> and vector C is <3,4,5> when the dot product is taken from A and C and A and B then wont it indicate that A and C are more similar even though A and B have more similar numbers ?

1

There are 1 best solutions below

0
On

The dot product $\vec{a} \cdot \vec{b}$ gives the quantity $|\vec{a}||\vec{b}|\cos \theta$, where $|\vec{x}|$ is the length of vector $\vec{x}$, and $\theta$ is the angle between the two vectors. This is reasonably easy to show in $\mathbb{R}^2$, a bit trickier in $\mathbb{R}^n$ for higher values of $n$, and for other vector spaces we actually define the angle between vectors as being the value of $\theta$ that makes this true for whatever we are saying is the dot product.

When we talk about using the dot product to measure "similarity" between vectors, we only really want the $\cos \theta$ component, in other words we're saying that two vectors are similar if they are pointing in similar directions. To get that value, we scale the dot product down by the lengths of the vectors, i.e. we define

$$S_C(\vec{a}, \vec{b}) = \cos \theta = \frac{\vec{a} \cdot \vec{b}}{|\vec{a}||\vec{b}|}$$

So if we take your examples $a = (0.5, 2, 1), b = (0.5, 0.3, 0.5), c = (3, 4, 5)$, then taking the standard vector length and dot product formulas we get:

$$\begin{eqnarray}S_C(a, b) & = & \frac{0.5 \times 0.5 + 2 \times 0.3 + 1 \times 0.5}{\sqrt{(0.5^2 + 2^2 + 1^2)(0.5^2 + 0.3^2 + 0.5^2)}} \\ & = & \frac{1.35}{\sqrt{5.25 \times 0.59}} \approx 0.767 \\ S_C(a, c) & = & \frac{14.5}{\sqrt{5.25 \times 50}} \approx 0.895 \\ S_C(b, c) & = & \frac{5.2}{\sqrt{29.5}} \approx 0.957 \end{eqnarray}$$

So what this says is that vectors $b$ and $c$ are pointing in a similar direction, as shown by their cosine similarity being close to 1.

If we use a different dot product, with a corresponding concept of vector length, we will get a different kind of similarity measure. For example, the inner product

$$\langle x, y \rangle = \sum_i I(x_i = y_i)$$

just counts the number of coefficients between the vectors in the same position that are exactly equal to each other, and $S_C(x, y) = \frac{\langle x, y\rangle}{3}$ is the fraction of equal coefficients. In this case, we get $S_c(a, b) = \frac{1}{3}$, while $S_C(a, c) = S_C(b, c) = 0$.