This question is motivated by a self-supervised learning problem in machine learning, but I'll try to strip out as many unnecessary details as possible. In this setting, we have large datasets and we constrain our deep neural network's outputs to lie on the hypersphere. I'm curious to know the spectral behavior of the network's outputs.
Suppose I have $N$ points drawn uniformly at random from the hypersphere in $D$ dimensions. If I form a $N \times D$ matrix of the points, what can I say about the matrix's singular values? In the limit as $N\rightarrow \infty$, do the singular values have a limiting distribution?
Define the $N\times D$ matrix as $M$, then we have $$\sigma_k(M)/\sqrt{N} \overset{\Bbb P}{\to} \sqrt{\Bbb E((x^Te_1)^2)}$$ where $\Bbb E((x^Te_1)^2)$ is a constant and can be easily estimated by Monte Carlo.
Proof:
Suppose $M$'s row vectors are $x_i\in\Bbb S^{D-1}$, fix an arbitrary $e\in\Bbb S^{D-1}$, and let $$\mu:=\Bbb E((x^Te_1)^2),\quad v:=\Bbb V((x^Te_1)^2)$$ where $x$ follows the uniform distribution on $\Bbb S^{D-1}$. Note that $\mu(e), v(e)$ actually don't depend on $e$ so we may just write it as $\mu, v$. By Chebyshev, we have $$\Bbb P\left(\left|\frac1N\sum_{i=1}^N(x_i^Te)^2-\mu\right| > \epsilon\right) \le \frac{v}{N\epsilon^2}$$ Therefore $\sqrt{\frac1N\sum_{i=1}^N(x_i^Te)^2}\overset{\Bbb P}{\to}\sqrt\mu$ uniformly for each $e$. This completes the proof, recalling the fact that all SVs can be defined as operator norms.
My Monte Carlo results for $D=3$:
Code: