Finding "the" Marchenko-Pastur distribution in the original article of 1967

766 Views Asked by At

I am looking at distribution properties of eigenvalues of sample covariance matrices.

Following the Wikipedia article on the Marchenko-Pastur distribution:

Let $X$ denote a $M \times N$ random matrix whose entries are i.i.d. random variables with mean $0$ and variance $\sigma^2 < \infty$. Let $$ Y_ N = N^{-1}XX^T, $$ (which in a statistical context I can view as a sample covariance matrix) and let $\lambda_1, \lambda_2, \ldots, \lambda_M$ be the eigenvalues of $Y_N$ (viewed as random variables). Finally consider the random measure $$ \mu_M(A) = \frac{1}{M}\#\{\lambda_j \in A\},\, \, A \subset \mathbb{R}. $$ Theorem. Assume that $M, N \rightarrow \infty$ so that the ratio $M/N \rightarrow \lambda \in (0,+\infty)$. Then $\mu_M \rightarrow \mu$ (in distribution), where $$ \mu(A) =\begin{cases} (1-\frac{1}{\lambda}) \mathbf{1}_{0\in A} + \nu(A),& \text{if } \lambda >1\\ \nu(A),& \text{if } 0\leq \lambda \leq 1, \end{cases} $$ and $$d\nu(x) = \frac{1}{2\pi \sigma^2 } \frac{\sqrt{(\lambda_{+} - x)(x - \lambda_{-})}}{\lambda x} \,\mathbf{1}_{[\lambda_{-}, \lambda_{+}]}\, dx$$ with $$ \lambda_{\pm} = \sigma^2(1 \pm \sqrt{\lambda})^2. \, $$

I then tried to align this statement with the original paper of Marchenko and Pastur (1967). However, the setting there is far less clear to me (i.e., I do not immediately see the connection to the setting of sample covariance matrices):

We shall consider as acting in $N$-dimensional unitary space $H_N$ a self-adjoint operator $B_N(n)$ of the form $$ B(n) = A_N + \sum_{i=1}^n \tau_i q^{(i)}(\cdot, q^{(i)}). $$ Here, $A_N$ is a nonrandom self-adjoint operator; $n$ is a nonrandom number; the $\tau_i$ are i.i.d. real random variables and the $q^i$ are mutually independent random vectors in $H_N$, independent also of the $\tau_i$. $(x, q^{(i)})$ denotes the inner product in $H_N$.

($\ldots$)

We shall be interested in the function $\nu(\lambda; B_N(n))$ giving the ratio of the number of eigenvalues of $B_N(n)$ lying to the left of $\lambda$ to the dimension of space.

$\nu(\lambda; B_N(n))$ is called the normalized spectral function of the operator $B_N(n)$. In the paper, on p. 458, there follow four assumptions I-IV, among them:

I. The limit $\lim_{N\rightarrow \infty} n/N = c$, which for brevity we call the concentration exists.

One now is interested in cases of very large $N$ and $n$ and the properties of the operator that ensure convergence in probability of $\nu(\lambda; B_N(n))$ to a nonrandom number $\nu(\lambda; c)$.

The main result is then contained in Theorem 1 on p. 460, where the normalized spectral function is given in terms of its Stieltjes transform. This is followed by three examples on p. 461, where the function can even be given explicitely and of which I assume the first one applies in my case:

1) The sum of random independent and equally probable projections. Let $B_N(n) = \tau\sum_{i=1}^n P_i$, where each $P_i$ is a projection operator on the random vector $q^{(i)}$, independent and uniformly distributed on the unit sphere and $\tau$ is a nonrandom number.

($\ldots$)

We find that $\nu(\lambda; c) = \nu_1(\lambda; c) + \nu_2(\lambda; c)$, where $$ \frac{d\nu_1(\lambda; c)}{d\lambda} =\begin{cases} (1-c)\delta(\lambda) & \text{for } 0 \leq c \leq 1,\\ 0 & \text{for } c > 1, \end{cases} $$ $$ \frac{d\nu_2(\lambda; c)}{d\lambda} =\begin{cases} \frac{\sqrt{4c\tau^2 - (\lambda -c\tau -\tau)^2}}{2\pi\tau\lambda} & \text{for } (\lambda-c\tau-\tau)^2 \leq 4c\tau^2,\\ 0 & \text{for } (\lambda-c\tau-\tau)^2 > 4c\tau^2, \end{cases} $$

My question now is how to relate this to the example in the Wikipedia article, i.e. how to view the sample covariance matrix in the form $B_N(n) = \tau\sum_{i=1}^n P_i$ and how the two distributions agree. Moreover, the concentration $\lambda$ in Wikipedia seems to be the reciprocal value of the $c$ in the paper?

Any help is appreciated!

Many thanks, P. Diaz