Let $x$ and $y$ are two arrays of independent random numbers with Gaussian normal distribution. $X$ and $Y$ are their accumulated values at each step,
$X_i = \sum_{k=0}^i x_k$
$Y_i = \sum_{k=0}^i y_k$
Even though there is no correlations between $x$ and $y$, since they are i.i.d, but there appears to be strong correlation between $X$ and $Y$.
I was writing some R code and found this perplexing, someone with better math knowledge might have a simple explanation of this phenomena?
R code here:
N = 1000
x = rnorm(N)
y = rnorm(N)
X = x
for(i in 2:N){ X[i] = X[i-1] + x[i]}
Y = y
for(i in 2:N){ Y[i] = Y[i-1] + y[i]}
summary(lm(y~x))
summary(lm(Y~X))
The first regression shows $R^2$ almost zero, will second one has rather large nonzero $R^2$. Try this multiple times.
Try plotting them and you might see why:
x and y, correlation - 0.000755:
X and Y, correlation 0.3245:
Essentially x and y are truly random next to each other, whilst it is the X and Y that go on a "walk" together and thus have correlation just because of the clustering of points on the X, Y graph.