A notion of "number of distinct variables"

53 Views Asked by At

Suppose $X = (X_1,X_2,X_3,X_4)$ is a multivariate Gaussian vector. I am interested in creating a notion of $N =$ how many "distinct" Gaussian random variables there are.

Here are some examples of what I am looking for.

Example 1: If all are i.i.d. standard normal, then $N=4.$

Example 2: If $X_1, X_2, X_3$ are i.i.d. standard normal and $X_4=X_3,$ then $N=3.$ However, if $X_4 = X_3 + \epsilon Y$ where $Y$ is independent standard normal and $\epsilon <<1$, then $N = 3 + f(\epsilon)$ where $f(\epsilon)\to 0$ as $\epsilon\to 0$.

Desired properties:

a) For $n$ variables, we have $N\leq n.$

b) If a new set of jointly Gaussian variables is appended to a given vector, then $N$ can only increase.

It seems that $N$ should be some simple property of the correlation matrix of the variables. In particular, it seems like it should be some property of the eigenvalues of the correlation matrix.

But I am not able to pin down exactly what it is. Any help will be appreciated.

Thanks.

EDIT: It looks like this might work. First, normalize all variables to have a variance of 1. Then, $N = \mathrm{Var}(X_1) + \mathrm{Var}(X_2|X_1) + \mathrm{Var}(X_3|X_1,X_2) + \ldots $. This would work but only if it can be shown that $N$ is invariant to permutation of the elements of the Gaussian vector.

Further EDIT: Some numerical calculations seem to show the proposed formula is invariant to permutation. Just working to prove it now.