Empirical distribution of sorted Gaussian numbers

439 Views Asked by At

I wrote a small program that does the following :

  1. Pick $N$ independent standard Gaussian numbers (expected value : 0, standard deviation : 1). Call that list $L=\{y_1, \ldots, y_N\}$.
  2. Sort that list in increasing order : $\tilde{L}=\mathrm{sort}(L)$.
  3. Plot that list on the $[-1,1]$ interval using a regularly distributed grid $x_i=-1+\frac{2i}{N-1}$, with $i=0,\ldots,N-1$.

I found that the plot was similar to that of the inverse error function, only differing by a multiplicative factor $a>0$.

Inverse error function & sorted Gaussian numbers plot, with N=10^5

I made a linear regression to find an approximate value of $1.42104$ for $a$. The two functions are very close for $N=10^5$ : Fitted inverse error function and sorted Gaussian numbers for N=10^5

I have two questions :

  1. What is the exact value of $a$ ?
  2. How to prove that the limit function is indeed $a*\mathrm{inverf}$ as $N\to \infty$ ?
1

There are 1 best solutions below

0
On

What you observe is the convergence of the empirical distribution function to cumulative distribution function of the sample - more accurately, of the empirical quantiles to theoretical quantiles (= values of inverse cumulative distribution function).

Specifically, for a continuous increasing cdf, theoretical quantiles are given by $$ x_q = F^{-1}(q) = \sup\{x\in\mathbb R: F(x)<q\}, q\in(0,1). $$ The definition of empirical quantiles varies. For a sample $X_1,\dots,X_n$ of iid variables they can e.g. be defined by $$ \hat x_q = X_{(\lfloor nq\rfloor +1)}, q\in (0,1), $$ where $X_{(1)}\le \dots\le X_{(n)}$ is the sorted sample. It is known that whenever the cdf $F$ is strictly increasing, $\hat x_q \to x_q$ for all $q\in (0,1)$ with probability $1$ as the sample size $n\to\infty$.

In your case, $F(x) = \Phi(x)$ is the standart normal cdf, which is related to the error function by $$\Phi(x) = \frac{1+\mathrm{Erf}(x/\sqrt{2})}{2},$$ so $$ \Phi^{-1}(y) = \sqrt{2}\operatorname{Erf}^{-1}(2y-1), y\in (0,1). $$ What you are doing is applying a similar transformation to the empirical quantiles, so the convergence to $\sqrt{2}\operatorname{Erf}^{-1} \approx 1.4142 \operatorname{Erf}^{-1}$ is not surprising.