Given two vectors, v1 and v2, how does mean-centering affect their relative orientation (angle)?
Below I use Python to define two vectors, mean-center them, and then compare their dot products:
a, b, c, d = np.random.randn(4)
v1 = np.array([a, b])
v2 = np.array([c, d])
v1_mean_centered = v1 - np.mean(v1)
v2_mean_centered = v2 - np.mean(v2)
# dot product before mean centering
v_before = v1 @ v2 / (np.linalg.norm(v1) * np.linalg.norm(v2))
# dot product after mean centering
v_after = v1_mean_centered @ v2_mean_centered / (np.linalg.norm(v1_mean_centered) * np.linalg.norm(v2_mean_centered))
v_before, v_after
v_before takes on random values, as we expect, but v_after is either 1 or -1. I do not understand why? Second, v_ater takes those values only for 2D vectors.
So my questions are:
- Why is
v_aftereither1or-1, and why does this happen only for 2D vectors? - I thought that mean-centering does not affect the angle, in which case either my code or reasoning is faulty.
Here's a worked out example showing how I'm doing the calculations:
v1 = [1 3]
v2 = [4 1]
v1_mean = 2.0
v2_mean = 2.5
v_dot_product = v1^T * v2 / (|v1||v2|) = 7 / 13.038 = 0.5368
v1_mean_centered = v1 - 2.0 = [-1 1]
v2_mean_centered = v2 - 2.5 = [1.5 -1.5]
v_mean_centered_dot_product = -3.0 / 3.0 = -1.0
v_dot_product != v_mean_centered_dot_product
Does this mean that mean-centering alters the (relative) vector direction?
If $v \in \mathbb R^2$ is any vector, and $v'$ is the result of mean-centering $v$, then we always have $v' = [-a \quad a]$ for some $a$. The reason for this is that after mean-centering, the mean of any set of data is always $0$ (that's the whole point of mean-centering a set of data!). So you are always finding the angle between two vectors of the form $[-a \quad a]$ and $[-b \quad b]$. If $a$ and $b$ have the same sign, these vectors point in the same direction; if they have opposite signs, the vectors point in opposite directions.
I'll add a quick interpretation of this fact: Suppose $(x_1, y_1), (x_2, y_2), \dots (x_N, y_N)$ is a set of $N$ pairs of numerical data. Form the two vectors $v = [x_1, x_2, \dots x_N]$ and $w = [y_1, y_2, \dots y_N]$, both in $\mathbb R^N$, and then form the mean-centered vectors $v'$ and $w'$. If $\theta$ is the angle between $v'$ and $w'$, then $\cos \theta$ is precisely the correlation coefficient for the set of data. The fact that in the case $N = 2$ we always have $\cos \theta = \pm 1$ corresponds to the fact that given any two data points (with $x_1 \ne x_2$) there is always a line that perfectly interpolates between them, so the correlation coefficient is always either exactly $1$ or exactly $-1$. This is, it should be clear, only the case if $N = 2$.