I'm new to Random Matrix Theory (RMT) and I'm playing around (=coding) with generating data in low dimensions, nonlinearly embedding them into a high dimension, and I'm noticing a seeming intuitively correct phenomenon. I'd really appreciate some rigorous insight into this, but first let me mathematically state what I'm doing, below.
Say I generate a random sample $X:=\{x_1 \dots x_n \} \in \mathbb{R}^m$, and by using a nonlinear map $\phi : \mathbb{R}^m \to \mathbb{R}^p, m < p$, I'm embedding the dataset $X$ into $\mathbb{R}^p$. Next, I'm considering the sample covariance matrix of the embedded data $C:=\frac{1}{p}\sum_{i=1}^{n}\phi(x_i)\phi(x_i)^{T}$ ($T$ denotes transpose of a matrix or vector, as required.). As in RMT, I take both $n, p$ large, and gradually increase $m$ from small to large, with of course $1 \le m \le p.$ I experimented with many different large values of $n,p$ in the order of thousands.
I notice the following:
When the data comes from a low dimension, i.e. $m$ is small, the covariance matrix $C$ of the embedded data in $\mathbb{R}^p$ has a lot of small eigenvalues, and all the eigenvalues tend to be very small ("small" is $<0.01$ in my code), and the lower $m$ is, the more eigenvalues are small. However, when $m$ increases and gradually nears $p$, the number of small eigenvalues tend to decrease and eventually when $m=p$, $\phi= Id,$ there's no more embedding, and the eigenvalue distribution is what I'd expect it to look like, the Marcenko-Pastur distribution. But as $m \to 1$, the distribution starts to "approach the Dirac mass at $0$" in the sense that more and more eigenvalues of $C$are getting smaller.
Is there any rigorous explanation for it, at least for some simple/nice embedding $\phi$? Note that it seems intuitively correct, as if in Marcenko-Pastur, you let $\frac{p}{n} \to 0,$ the distribution approaches $\delta_{\sigma^2}$, where the rectangular data matrix $X$ has iid entries with mean $0$ and variance $\sigma^2$. But in my case, the difference is that we're non-linearly embedding the data from dimension $m$, and then look at the eigenvalues of the covariance $C:=\frac{1}{p}\sum_{i=1}^{n}\phi(x_i)\phi(x_i)^{T}$ of the embedded data.
So in essence, I'm looking to see why covariance of data coming from lower dimension $m$ would have more small eigenvalues, and why the number of small eigenvalues goes up when $m$ goes down. Thank you so much!!!