How to test a collection of samples are sampled with replacement or not?

210 Views Asked by At

A box is full of balls with $m$ different colors, and for each color, there are $n$ balls. So the total number of balls is $m*n$. Note that $m$ is unknown, $n$ is already known, and balls can only be distinguished by the color.

Now, if one draws $N$ samples from the box with replacement or without replacement, how can I know whether the sample is drawn with replacement or not? any hypothesis test to tackle it?

Thanks!


To make the question clearer:

What I am concerned is that how to measure the degree that the current samples are drawn with replacement. It might be a function $f$, which takes the samples $\{x\}$, $m$, and $n$ as inputs. If $f(\{x\}, n, m) = 1$, then the sampling process is sampling with replacement; If $f(\{x\}, n, m) = 0$ , then the sampling process is sampling without replacement. Other values between 0 and 1 describe the likelihood that the sampling process is with replacement.

1

There are 1 best solutions below

2
On BEST ANSWER

You can compute the expected distribution of colors in the two cases. Intuitively, the distribution will be more even if the draws are without replacement, because if one color has many balls drawn at one point it will tend to have fewer drawn afterwards. If you have only a few draws you won't be able to tell the difference. If you have lots of draws you can be confident that you know $m$-it is the number of different colors you have seen.

To be formal about it, you have lots of hypotheses available-if you have seen $k$ different colors so far you can assume there are $k, k+1, k+2, \dots$ colors and with/without replacement. You can compute the chance of your observed draw based on each of these hypotheses. If you have enough draws, one will stand out. I suspect that will only happen when you have enough draws to believe you have seen all the colors, probably when the minimum number of any one color is three (maybe two). Even so, if $n$ is large it will be hard to tell whether there is replacement until you have a reasonable fraction of $n$ of each color.

Added in response to comment: That is exactly what I was trying to answer. Suppose you draw $30$ balls and find $10$ different colors. You can frame a number of hypotheses:

  1. 10 colors, draw with replacement
  2. 10 colors, draw without replacement
  3. 11 colors, draw with replacement
  4. 11 colors, draw without replacement
  5. and so on

You can compute the chance of your observed draw under each of these. If you have a few of each color you have seen, the chance that you have missed a color entirely is low, so you won't need to go down the list very far. Number 1 will be more probable than number 2 if the distribution among the colors is rather uneven. Number 2 will be favored if the distribution is even.