Why both corr(x, y) and corr(a, b) can be larger than corr([x, a], [y, b])?

28 Views Asked by At

Here, corr(x, y) means the cosine similarity, and [x, a] means the concatenation of two vectors.

I thought it should always hold that corr(x, y) $\ll$ corr([x, a], [y, b]) $\ll$ corr(a, b), but oneday, I found it was wrong when I computed using real datasets, and the relation as shown in the question title could hold in some situtation.

My question: can we prove in which situtation the title holds, and some clear numerical simulations are also welcome.

In python3.7, I find a simulation which however cannot give me enough intuitive hints:

np.random.seed(0)
yn = np.random.normal(loc=0.00028, scale=0.00098, size=1000).tolist()
yn_hat = np.random.normal(loc=0.00008, scale=0.000248, size=1000).tolist()
print(corr(yn, yn_hat)) #4.872580685041682

ye = np.random.normal(loc=0.00424, scale=0.031, size=1000).tolist()
ye_hat = np.random.normal(loc=0.0097, scale=0.0274, size=1000).tolist()
print(corr(ye, ye_hat)) #3.8225452426943805

print(corr(yn+ye, yn_hat+ye_hat)) #3.8216710106475364
#in python, the 'add' of two list is the concatenation not element-wise addition