Quartiles of Sample are Good Estimators of Quartiles of Original Data?

30 Views Asked by At

I am trying to argue that if I have $n$ numbers, and I sample $cp\log p$ numbers of them, then find the 4 quartiles of them $S_1, S_2, S_3, S_4$, then use them to the estimate the real quartiles $Q_1, Q_2, Q_3, Q_4$, then each quartile will have at most $\frac{n}{3}$ elements with a high probability . I am trying to use Chernoff Inequality but I am stuck, any help?

1

There are 1 best solutions below

0
On

Hint:

Assume the CDF $F$ is continuous (so all quantiles exist in the strict sense). It's actually sufficient to assume that all quartiles exist in the strict sense (i.e. $1/4,1/2,3/4$ are all in the range of $F$). If they don't then you need to be more careful about discreteness issues.

Anyway, you want a way to estimate the probability that there exists $k \in \{ 1,2,3,4 \}$ such that the number of elements of the sample in $(Q_{k-1},Q_k]$ exceeds $n/3$, where $Q_0=-\infty$.

The distribution of the number of elements of the sample lying in $(Q_{k-1},Q_k)$ for a fixed $k$ is Binomial(n,1/4). Call that $X_k$. The $X_k$ are not independent because they are constrained to sum to $n$, but they are independent upon conditioning on that event. That is, the joint PMF is of the form

$$p_{(X_1,X_2,X_3,X_4)}(x_1,x_2,x_3,x_4)=\begin{cases} Z^{-1} \prod_{i=1}^4 p_X(x_i) & x_i \in \{ 0,1,\dots,n \},\sum_{i=1}^4 x_i=n \\ 0 & \text{otherwise} \end{cases}$$

where $Z$ is a normalization constant (a physicist might call it a "partition function") and $p_X$ is the Binomial(n,1/4) PMF.

You are left with some inclusion-exclusion problem to handle. The choice of $n/3$ conveniently ensures that the relevant triple and quadruple intersections are empty, so

$$P(\exists k : X_k>n/3)=\sum_k P(X_k>n/3)-\sum_{j<k} P(X_j>n/3,X_k>n/3).$$

So you need to know the joint distribution of the pairs $(X_j,X_k)$ in order to complete the problem. Of course, for each sum, all the summands in that sum are the same number, so this is of the form

$$4p-6q$$

for some $p,q$ that you need to find. $p$ is easy enough to find, it is just $1-F_X(n/3)$ where $F_X$ is the Binomial(n,1/4) CDF. Finding $q$ will require you to think about the joint PMF I gave above a little bit.