Inferences about a changing but unknown finite population as you sample without replacement

27 Views Asked by At

Suppose you have a bucket with a mix of two ball colors (say red and blue). All you know is that there are N balls in the bucket to start. Suppose you sample one ball at a time, keep track of what color it is and what order it comes out in (if it matters at all, being random). The bucket now has N-n balls in it but youve seen the sampling history. Is there a way to infer anything about the remaining population (rather than about the sample). How does the remaining populations probability mass function change? How do you keep track of the new mean and probabilities?

Is this an absurd question? The way I see it the statistics that I know says we can only talk about the sampling distribution itself, and at best we can infer about "infinite" (ie very large) populations or about replacement problems. But Im not sure how to infer about a diminishing non-replaced population as the sample grows.

2

There are 2 best solutions below

0
On

Partial answer. Let's just try to do some Bayesian inference on a single draw. Suppose you have a prior for the distribution of number of red balls $p_k, 0 \leq k \leq N$. Then the posterior for the number of red balls in the $N-1$ remaining balls would be

$$ q_k = \frac{\frac{N-k}{N}p_k}{\sum_{j=0}^N \frac{N-j}{N}p_j} = \frac{(N-k)p_k}{\sum_{j=0}^N (N-j)p_j} $$

$$ r_k = \frac{\frac{k+1}{N}p_{k+1}}{\sum_{j=0}^{N-1} \frac{j+1}{N}p_{j+1}} = \frac{(k+1)p_{k+1}}{\sum_{j=0}^{N-1} (j+1)p_{j+1}} $$

if we draw a blue ball or a red ball, respectively. I'm not certain there's a nice conjugate prior distribution for these likelihoods (given the off-by-one when you draw a red ball). I'll continue to think about it. (Also, let me know if I've made an error in my expressions.)

1
On

I don't quite see the issue here - your random sample is representative of the population you drew it from. You never need to infer anything about your population sample, since you can observe it directly. If you observe that 50% of your sample consists of red balls, you can infer that the population you drew it from (and the remaining population that you didn't sample) consists of 50% red balls, and you can do so with varying degress of certainty depending on your sample size.

You seem to be approaching the problem with the intuition that the samples you draw are somehow differently distributed than the samples you didn't draw. If your random samples are not representative of the population, you have a problem with your sampling. Drawing lots of red balls in the start of your sample doesn't mean that you're less likely to draw more red balls when you sample more - it means that you probably have lots of red balls in the population. In fact, every red ball you draw makes it more likely that the next ball will be red, since your sample estimate of proportion of red balls goes up with each red ball you see. If, for example, you draw 99 red balls from a bucket of 100 balls, it doesn't mean that the last ball is blue just because you haven't see one yet - the last ball is very likely red, since that's what your sample suggests so far.