Correlation of linearly related variables

31 Views Asked by At

I was reading an article in which they show a correlation table that looks very suspicious to me. I am not a mathematician, so I would like extra advice on this issue. I can summarize the problem as follows.

Consider these 2 sets of random variables: $\{X_i, Y_i, Z_i\}$ for $i \in \{1,2\}$.

We know that for all $i$, $Z_i=1-X_i-Y_i$.

The table presented in the paper, shows 3 values of $\rho_K$ for $K \in \{X,Y,Z\}$, where $\rho_k$ is the Pearson correlation coefficient between $K_1$ and $K_2$.

In their paper, both $\rho_X$ and $\rho_Y$ are high (above 0.7) and statistically significant, but $\rho_Z$ is low and not significant.

Is this even possible?? I ran a few simulations in R and the simplest scenario with independent Xs and Ys and definitely I end up with a high $\rho_Z$. I also tried to linearly relate (in my simulations) $X_i$ to $Y_i$ and still end up with high and significant $\rho_Z$. What am I missing?

Thanks!

1

There are 1 best solutions below

0
On

Yes, it's possible. Even more extreme scenarios are possible. Start with $Z_1$ and $Z_2$ that are uncorrelated, so $\rho_Z = 0$. Let $W$ be some random variable with large variance. Let $X_i = 1-W$ and $Y_i = 1 - X_i - Z_i = W - Z_i$. Then $\rho_X = 1$ and $\rho_Y$ is close to $1$.