Estimating certain order statistics of a set.

112 Views Asked by At

I am given $n$ numbers (say between 0 and 1) that I partition uniformly at random into $n/k$ sets $S_1, \ldots, S_{n/k}$ each of size $k$. Now I compute the following quantity:

$$ Y = (k/n) \sum_{i=1}^{n/k} \min S_i $$

i.e $Y$ is the average of the minimum values in each set.

Is there any nontrivial relation between $E[Y]$ and order statistics of the set of numbers ? (here the expectation is over the space of partitions). Would the answer change (or become more interesting) if $Y$ was instead the median instead of the average ?

This question came to mind after reading an article on estimated cheapest house prices in a metropolitan area.

1

There are 1 best solutions below

0
On

Let $X_1\leq \dots\leq X_n$ be the numbers. Then $$\mathbb{E}[Y] = \frac{k}{n}\sum_{t=1}^{n-k+1} \frac{\binom{n-t}{k-1}}{\binom{n-1}{k-1}} X_t \approx \frac{k}{n}\sum_{t=1}^{n-k+1} e^{-k t/n} X_t.$$ (the approximate equality holds under some assumptions on $n$, $k$ and $X_t$).

Proof. Let us say that $X_i$ is important if $X_i$ is the smallest element in some set $S_j$ (whether $X_i$ is important or not is a random event that depends on the partition). Note that there are exactly $n/k$ important numbers $X_i$. We have,

$$\mathbb{E}[Y] = \frac{k}{n}\mathbb{E}\left[\sum_{i=1}^{n/k} \min(S_i)\right] = \frac{k}{n} \sum_{t=1}^{n} X_t \Pr(X_t\text{ is important}).$$ Now compute the probability that $X_t$ is important. Suppose that $X_t\in S_i$. Then $X_t$ is important if and only if $X_1,\dots, X_{t-1}$ are not in $S_i$. In other words, $X_t$ is important if and only if $S_i \setminus \{X_t\} \subseteq \{X_{t+1},\dots, X_n\}$. The probability of this event is $$\frac{\binom{n-t}{k-1}}{\binom{n-1}{k-1}}$$ (here we use that $|S_i \setminus \{X_t\}| = k-1$).