I would like to understand if the sample distribution of the following approaches are the same or not.
Setup: population of size $N$, with binomial distribution. required sample size is $n$
Approach 1 (direct sample): randomly sample $n$ samples from population $N$ without replacement.
Approach 2 (sample from samples): randomly sample $k$ samples from population $N$ ($N>k>n$) without replacement, and then sample $n$ samples from the selected $k$ samples without replacement.
Questions:
- will approach 1 and 2 produce the same sample distribution?
- Can I use sample 2 to estimate mathematical properties of population $N$? For example, given accuracy of sample $n$, estimate the accuracy range of the population $N$?
There's a bit of ambiguity in your language, so let me make sure I have your intended meaning down: when you say, "randomly sample n samples from population," I think you mean, "randomly select n objects from population" instead. The distinction matters; are we collecting $n$ samples, or one sample consisting of $n$ objects?
Presuming it is indeed just one sample, then the two methods are equivalent. Either will produce a collection of $n$ objects from the population, and each collection of size $n$ has a $1 / \binom N n$ chance of being selected. For proof, you could just use a symmetry argument, or you could compute directly that in the second case the probability of a subgroup of size $n$ being selected is $$\frac{\binom{N-n}{k-n}}{\binom N k} \cdot \frac{1}{\binom k n} = \frac{(N-n)! \, k! \, (N-k)! \, n! \, (k-n)!}{(k-n)! \,(N-k)! \, N! \, k!} = \frac{1}{\binom N n}.$$
If you do mean that you're doing repeated samples, though, then whether it constitutes the same process as the approach 1 depends on how often you are selecting the superset of $k$ objects. If they are redrawn each time, then this process is indistinguishable from approach 1. But if the set of $k$ objects is fixed and repeated samples are taken from that smaller subset, then no, this process is not equivalent to approach 1.