Comparing Percentiles of 2 Samples Drawn from the Same Distribution

76 Views Asked by At

Suppose I have two sets of numbers: $A=\{a_1,a_2,...a_{N_1}\}$ and $B=\{b_1,b_2,...b_{N_2}\}$ with $N_1<N_2$. WLOG assume that $a_i<a_j$ for all $i<j$ and similarly for $b_i$ and $b_j$. Further suppose that both $a_i$ and $b_j$ are drawn (independently) from the same distribution, which is normal: $a_i,b_j$~$N(0,\sigma^2)$ for all $i,j$.

Let $a_{95}$ and $b_{95}$ represent the 95th percentile of the sets $A$ and $B$, respectively. The question is: Is it the case that $E[a_{95}]$> $E[b_{95}]$? Is there anything we can say about these two values?

Simulations tell me the answer is yes, but not always. For example, if $N_1=1$ the answer is false, since in this case $E[a_{95}]=0$. I'm not sure how to begin so any help is appreciated.

1

There are 1 best solutions below

1
On BEST ANSWER

Let $\hat \xi_p$ denote the sample $p$-quantile and $\hat F_n$ denote the empirical distribution. Then

$$P\{\hat \xi_p\le x\}=P\{\hat F_n\ge p\}=\sum_{k=\lceil np \rceil}^n \binom{n}{k}[F_X(x)]^{k}[1-F_X(x)]^{n-k}$$

because $nF_n(x)\sim\text{Binomial}(F_X(x),n)$ where $F_X$ is the true CDF. Differentiating the RHS w.r.t. $x$ yields the pdf of $\hat \xi_p$ which is given by

$$f_{\hat\xi}(x)=\lceil np\rceil\binom{n}{\lceil np\rceil}[F_X(x)]^{\lceil np\rceil-1}[1-F_X(x)]^{n-\lceil np\rceil}f_X(x)$$

So, both $f_{\hat\xi}$ and $\mathbb{E}\hat\xi_n$ depend on $n$ in a non-monotonic way because of ceiling. Here is the results of numerical integration for $\mathbb{E}\hat\xi_{0.95}$ with $F_X(\cdot)=\Phi(\cdot)$ (standard normal with $\xi_{0.95}\approx 1.645$):

$\hskip0.7in$enter image description here

Also note that (assuming that $F_X$ is continuous at $\xi_p$)

$$\sqrt{n}(\hat \xi_p-\xi_p)=Z_p+o_p(1)\text{ where } Z_p\sim N\left(0,\frac{p(1-p)}{f_X^2(\xi_p)}\right)$$

So for large $n$, the distribution of $\hat \xi_p$ is approximately normal around the true $\xi_p$.