Why am I experimentally getting more small eigenvalues of sample covariance matrices when data come from lower dimensional spaces, and contrary?

113 Views Asked by At

I'm new to Random Matrix Theory (RMT) and I'm playing around (=coding) with generating data in low dimensions, nonlinearly embedding them into a high dimension, and I'm noticing a seeming intuitively correct phenomenon. I'd really appreciate some rigorous insight into this, but first let me mathematically state what I'm doing, below.

Say I generate a random sample $X:=\{x_1 \dots x_n \} \in \mathbb{R}^m$, and by using a nonlinear map $\phi : \mathbb{R}^m \to \mathbb{R}^p, m < p$, I'm embedding the dataset $X$ into $\mathbb{R}^p$. Next, I'm considering the sample covariance matrix of the embedded data $C:=\frac{1}{p}\sum_{i=1}^{n}\phi(x_i)\phi(x_i)^{T}$ ($T$ denotes transpose of a matrix or vector, as required.). As in RMT, I take both $n, p$ large, and gradually increase $m$ from small to large, with of course $1 \le m \le p.$ I experimented with many different large values of $n,p$ in the order of thousands.

I notice the following:

When the data comes from a low dimension, i.e. $m$ is small, the covariance matrix $C$ of the embedded data in $\mathbb{R}^p$ has a lot of small eigenvalues, and all the eigenvalues tend to be very small ("small" is $<0.01$ in my code), and the lower $m$ is, the more eigenvalues are small. However, when $m$ increases and gradually nears $p$, the number of small eigenvalues tend to decrease and eventually when $m=p$, $\phi= Id,$ there's no more embedding, and the eigenvalue distribution is what I'd expect it to look like, the Marcenko-Pastur distribution. But as $m \to 1$, the distribution starts to "approach the Dirac mass at $0$" in the sense that more and more eigenvalues of $C$are getting smaller.

Is there any rigorous explanation for it, at least for some simple/nice embedding $\phi$? Note that it seems intuitively correct, as if in Marcenko-Pastur, you let $\frac{p}{n} \to 0,$ the distribution approaches $\delta_{\sigma^2}$, where the rectangular data matrix $X$ has iid entries with mean $0$ and variance $\sigma^2$. But in my case, the difference is that we're non-linearly embedding the data from dimension $m$, and then look at the eigenvalues of the covariance $C:=\frac{1}{p}\sum_{i=1}^{n}\phi(x_i)\phi(x_i)^{T}$ of the embedded data.

So in essence, I'm looking to see why covariance of data coming from lower dimension $m$ would have more small eigenvalues, and why the number of small eigenvalues goes up when $m$ goes down. Thank you so much!!!