Estimator for ranking in sample

115 Views Asked by At

I have a very interesting topic regarding unbiased estimator for ranking in samples as I would need to do those kind of estimation in real world given limited sampling data, here is the question:

Assume the sample size is huge (total population > 1 billion), take a sample of size 100, X achieves No. 1; Take another random sampling of size 100, X achieves No.1 again, how to estimate the actual ranking of X?

To me, this is a Bayesian question and I believe it is related to lots of other topics here.

1

There are 1 best solutions below

0
On

I’ll assume that all ranks of $X$ are equiprobable a priori.

Let $m=100$ be the sample size, $n$ the population size and $k$ the rank of $X$. Then there are $\binom{n-k}{m-1}$ ways to choose $m-1$ other elements ranked lower than $X$ and $\binom nm$ ways in total to choose $m$ elements, so the probability for $X$ to come out first in a given sample is

$$ \frac{\binom{n-k}{m-1}}{\binom nm}\;. $$

Thus, the most likely rank of $X$ is $k=1$. If you only take a single sample and $X$ comes out first, the expected value of $k$ is

$$ \sum_{k=1}^n\frac{\binom{n-k}{m-1}}{\binom nm}k=\frac{\binom{n+1}{m+1}}{\binom nm}=\frac{n+1}{m+1}\;. $$

If you take two samples and $X$ comes out first both times, the expected value of $k$ is

$$ \frac{\sum_{k=1}^n\binom{n-k}{m-1}^2k}{\sum_{k=1}^n\binom{n-k}{m-1}^2}\;. $$

I doubt that this can be simplified. We can approximate it using

$$ \frac{\binom{n-k}{m-1}}{\binom{n-1}{m-1}}\approx\left(1-\frac{m-1}n\right)^{k-1} $$

for $m\ll n$. With

$$ \frac{\sum_{k=1}^\infty q^{k-1}k}{\sum_{k=1}^\infty q^{k-1}}=\frac1{1-q}\;, $$

this approximation yields an expected value

$$ \frac1{1-\left(1-\frac{m-1}n\right)}=\frac n{m-1} $$

in the case of one sample, reasonably close to the exact value $\frac{n+1}{m+1}$. For two samples, it yields

$$ \frac1{1-\left(1-\frac{m-1}n\right)^2}\approx\frac12\frac n{m-1}\;. $$

Thus, the expected rank of $X$ is roughly halved by having two samples instead of one in which it comes out first.