Interview question: is it possible that the correlation of each partition series is negative but the whole series is positive

44 Views Asked by At

Suppose we have 100 observation point in a time series:

$$[(x_1,y_1),\cdots,(x_{100},y_{100})].$$

Now we divide the time series into ten segments:

$$[(x_1,y_1),\cdots,(x_{10},y_{10})],\cdots,[(x_{91},y_{91}),\cdots,(x_{100},y_{100})]$$

Is it possible that the correlation between $x$ and $y$ of each segment is negative, however the correlation of the whole time series is positive?

I tried 2 segments case:

$$\dfrac{(x_1y_1 + \cdots + x_{10}y_{10})}{10} - \dfrac{(x_1 + \cdots + x_{10})}{10}\dfrac{(y_1 + \cdots + y_{10})}{10}<0$$ $$\dfrac{(x_{11}y_{11} + \cdots + x_{20}y_{20})}{10} - \dfrac{(x_{11} + \cdots + x_{20})}{10}\dfrac{(y_{11} + \cdots + y_{20})}{10}<0.$$

We want to obtain the result: $$\dfrac{(x_1y_1 + \cdots + x_{20}y_{20})}{20} - \dfrac{(x_1 + \cdots + x_{20})}{20}\dfrac{(y_1 + \cdots + y_{20})}{20}>0.$$ And we have: $$(x_1y_1 + \cdots + x_{20}y_{20}) > \dfrac{1}{10}\Big((x_1 + \cdots + x_{10})(y_1 + \cdots + y_{10}) +(x_{11} + \cdots + x_{20})(y_{11} + \cdots + y_{20})\Big).$$

I was stopped here since the inequalities in condition is inverse to the inequality in the expected result and I have no idea for the further conduction.

Edit:

I added a graph to illustrate Simpson's paradox: enter image description here

which can intuitively solve the problem.