$QQ$-plot - Why do we choose the empirical distribution $F_n(x) = \frac {\#\{y \in S \mid y \le x\}} n$, $S$ is sample, for comparison with normal?

Question

$QQ$-plot - Why do we choose the empirical distribution $F_n(x) = \frac {\#\{y \in S \mid y \le x\}} n$, $S$ is sample, for comparison with normal?

130 Views Asked by Bumbble Comm At 07 Apr 2026 - 10:59

$QQ$-plot - Why do we choose the empirical distribution $F_n(x) = \frac {\#\{y \in S \mid y \le x\}} n$, $S$ is sample, for comparison with normal ?

Let $S$ be our sample of size $n$. Then we form the empirical distribution $F_n$ as defined above. We then use a $QQ$-plot to compare $F_n$ to $N(0,1)$ to see if there might be a linear relationsship.

Why do we choose $F_n$ as the emperical distribution for our sample ?
Could we get other results if we did not choose $F_n$ as the emperical distribution ?
For the fractile of $p \in (0,1)$ we choose the midpoint $x$ of the interval corresponding to $p$. Why do we choose the midpoint ?

I would appreciate your help.

Original Q&A

There are 2 best solutions below

user76844 On 23 Jan 2014 - 8:30

The emipirical distribution function is the maximum likelihood estimator of the underlying distribution. Chosing anyting else will distort your data. The midpoint is chosen essentially by convention, since the empirical distribution is a step function. A simple example is the estimator for the median: if N=even, you are selecting the midpoint. Depending on the underlying distribution, (if you knew it), it might make more sense to take something besides the midpoint, but you will normally not have a good for doing something other than the midpoint.

**Bumbble Comm** · Accepted Answer

Suppose $X_1,X_2,\ldots$ is an i.i.d. sequence of $N(0,1)$-distributed random variables. If $$ F_n(x)=\frac{\#\{1\leq i\leq n\mid X_i\leq x\}}{n}=\frac1n \sum_{i=1}^n \mathbf{1}_{\{X_i\leq x\}} $$ denotes the empirical distribution function then $$ F_n(x)\to \Phi(x) \quad\text{almost surely as}\;n\to\infty, $$ for all $x$, where $\Phi$ is the CDF of an $N(0,1)$-distribution.

This means that if you have an i.i.d. sample following an $N(0,1)$-distribution and $n$ is large enough, then $F_n$ must be "close" to $\Phi$. This is checked by plotting $F_n(x)$ against $\Phi(x)$ and checking if the points lie on the identity line.

$QQ$-plot - Why do we choose the empirical distribution $F_n(x) = \frac {\#\{y \in S \mid y \le x\}} n$, $S$ is sample, for comparison with normal?

There are 2 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in STATISTICAL-INFERENCE

Trending Questions

Popular # Hahtags

Popular Questions