Spooky correlation between two independent random walks

69 Views Asked by At

Let $x$ and $y$ are two arrays of independent random numbers with Gaussian normal distribution. $X$ and $Y$ are their accumulated values at each step,

$X_i = \sum_{k=0}^i x_k$
$Y_i = \sum_{k=0}^i y_k$

Even though there is no correlations between $x$ and $y$, since they are i.i.d, but there appears to be strong correlation between $X$ and $Y$.

I was writing some R code and found this perplexing, someone with better math knowledge might have a simple explanation of this phenomena?

R code here:

N = 1000
x = rnorm(N)
y = rnorm(N)

X = x
for(i in 2:N){ X[i] = X[i-1] + x[i]}
Y = y
for(i in 2:N){ Y[i] = Y[i-1] + y[i]}

summary(lm(y~x))
summary(lm(Y~X))

The first regression shows $R^2$ almost zero, will second one has rather large nonzero $R^2$. Try this multiple times.

1

There are 1 best solutions below

2
On

Try plotting them and you might see why:

import numpy as np
n = 100000
x = np.random.normal(0, 1, n)
y = np.random.normal(0, 1, n)

X = np.cumsum(x)
Y = np.cumsum(y)

x and y, correlation - 0.000755: enter image description here

X and Y, correlation 0.3245: enter image description here

Essentially x and y are truly random next to each other, whilst it is the X and Y that go on a "walk" together and thus have correlation just because of the clustering of points on the X, Y graph.