Describing correlation between $M$ random variables

113 Views Asked by At

I'm new to statistics and I have been trying to find a "standard" way to describe the correlation between $M$ random variables. For simplicity, let's take $M$ to be 3 and define the three RVs $X,Y,$ and $Z$ with respect to three independent RVs $A,B,$ and $C$ such that:

$X=A$

$Y=A+B$

$Z=A+B+C$

How can we describe the pair-wise and triplet-wise correlation between $X,Y,$ and $Z$? Do three pair-wise correlations suffice?

1

There are 1 best solutions below

2
On BEST ANSWER

Nope, they are not sufficient. There are more complex issues that might happen.

For these situations it is interesting to analyze other statistics. There are partial correlations and the coefficient of determination. Both involve linear regression.

In the first case, you model $X$ as a function of $Z$ using simple linear regression. The partial correlation $\rho_{XY.Z}$ (read correlation between $X$ and $Y$ discounted $Z$) is defined as the correlation between the residuals $e_x$ and $e_y$ of the regressions $X\sim Z$ and $Y\sim Z$. It represents the correlation of $X$ and $Y$ when discounted any possible effects from $Z$. Shuffle the random variable symbols around to compute $\rho_{XZ.Y}$ and $\rho_{YZ.X}$. Expanding to $M$ variables $V_1,...,V_M$, you have to consider $\rho_{V_iV_j.\mathbf{V}}$ where $\mathbf{V}$ is an arbitrary subset of $V_1,...,V_M$. Two by two correlation is not enough, but a full Covariance Matrix or Precision Matrix is.

In the second case, you want the $R^2$ statistic. Build a multiple regression with equation $Y=b_0+b_1X+b_2Y$, and $R^2$ will appear as the simplest indicator of model quality (but if the regression has any meaning for you, don't consider it your single quality indicator!), being a ratio of Sums of Squares. $R^2$ can be seen as a composite correlation indicator.

Also, analyze other statistics and plot graphs. Check Anscombe's Quartet to realize why - it is a set of four samples with completely different behavior and exactly same mean, variance and correlation.