Probability distribution of n'th-order statistic when sampling without replacement.

194 Views Asked by At

I have been trying to understand the derivation from the UMVUE for the German Tank Problem. We have $n$ values sampled without replacement from a population $\{1, 2, \cdots, N\}$ of unknown size $N$, and we wish to estimate $N$. Say the samples we observes are $X_1, \cdots, X_n$, and the ordered statistics are $X_{(1)}, \cdots X_{(n)}$.

In Johnson 1994, it states that the probability that the probability that the maximum of our observations equals $j$ is:

$$Pr[X_{(n)} = j] = \frac{{j-1}\choose{n-1}}{{N}\choose{n}}$$

However, I keep thinking that there is a missing factor of $n$.


My thought process:

Numerator: We have $n$ slots for our sample. We have $1\choose 1$ ways of pulling $j$ from the population, and we have ${n \choose 1} = n$ ways to place it in our $n$ slots. The remaining $n-1$ slots must be filled with one of $\{1, \cdots, j-1\}$. All items are distinguishable, and there are ${n-1 \choose n-1} = 1$ ways to choose the remaining slots. So, omitting the factors of 1,

$$numerator = n \times {{j-1}\choose{n-1}}$$

Denominator: We have $n$ slots and can choose from any of our $N$ in the population. We have ${n \choose n} = 1$ choices for which slots we take. All items are distinguishable. So

$$denominator = {N \choose n} $$


What might I be misunderstanding? (I suspect it's something to do with "sequences vs sets" and/or the hypergeometric distribution, whose intuitions seem to often elude me...) Thank you in advance!

1

There are 1 best solutions below

3
On BEST ANSWER

You’re sometimes distinguishing by order and sometimes not. You need to consistently distinguish by order either in both the numerator and the denominator or in neither.

Your denominator is correct if you don’t distinguish by order: There are $\binom Nn$ equiprobable unordered tuples of observations you could have made. Your numerator is wrong because while you don’t care about the order of the $n-1$ elements, you do introduce a factor of $n$ according to the $n$ points in the order of the observations at which $j$ could have been observed. So that factor of $n$ shouldn’t be there.