How do I calculate the correlation matrix from this data?

748 Views Asked by At

Let $A=[X_1, ..., X_n]$ be an $n \times n$ matrix where each column represents a random variable with $n$ data points. What is the correlation matrix of $A$?

I understand that a correlation matrix $R_X$ of a random vector $X = [X_1, ..., X_n]^T$ is $E[XX^T]$, but how do I calculate the correlation matrix of $A$ when each $X_i$ has $n$ data points?

How would you compute something like $E[X_1X_2]$ or $E[X_1^2]$? I'm assuming it's some form of estimator but I've only ever seen an estimator for a sample $\bar Y = \frac{1}{n}\sum_i Y_i$ where each $Y_i = y_i$ has a specific value that was sampled.

2

There are 2 best solutions below

0
On

I believe you are confusing the correlation matrix of a random vector (which is, indeed, $E(XX^T)$ if $E(X)=0_n^T$), and the sample correlation matrix, given a sample of $N$ points for vector $X$, which would be $$\frac1N \sum_{i=1}^N X_iX_i^T$$ (assuming $E(X)=0_n^T$) or, in general, $$\frac1N \sum_{i=1}^N (X_i-\bar X)(X_i-\bar X)^T,$$ where $\bar X=\frac 1N\sum_{i=1}^N X_i$ and $X_i$ is each observation (as a column matrix) of the random vector $X$.


NOTE: Notice that the dimension $n$ of the random vector $X$ and the size of the sample $N$ need not be equal.

0
On

If each column is zero mean, you could compute the correlation matrix as $$\frac{1}{N} \sum X_iX_i^T =\frac{1}{N} AA^T$$ If each column $X_i$ comes from a distribution of mean $\mu_i$, all you have to do is center each column, i.e. $\sum (X_i - \mu_i)(X_i-\mu_i)^T$