I saw this example from a website
Suppose there is a jar containing many gumballs, each with a unique number on it. The numbers range from 0 to 32 and there is an equal number of gumballs with each number. A student set out running an experiment with the following procedure: Pick five gumballs from the jar, calculate the mean of the numbers on the gumballs, write down the result on a piece of paper, and put the gumballs back to the jar. Repeat the process 499 times so altogether there are 500 means recorded.
So how is it compared to an approach with a sample size of 1, and without replacement, so he picks 2500 gumballs at once? Is not it a better estimation of the mean?
Is picking an N/m sample of sample size m, is better than picking an N sample of size 1 when estimating a population mean? In which case the variance will be higher?
Let's first clearly define some terminology here.
As the scenario is described, the draws are without replacement but the samples are with replacement. However, because we are told that there are "many" balls, and the proportion of balls that are labeled with a given number (from 0 to 32) is equal, we can assume that the sampling distribution of individual draws without replacement is approximately the same as the sampling distribution of draws with replacement; that is to say, because there are many gumballs, individual draws are assumed to be independent and identically distributed.
Now, under the assumption that samples are taken with replacement (as described in the given scenario), each sample is also independent and identically distributed. So the sampling distribution of the sample mean takes on a simple form that does not depend on the number of gumballs in the jar, only the number of samples taken (500) and the number of draws in each sample (5).
If we instead take samples without replacement, as you propose, then the samples are no longer independent and identically distributed, because the removal of previous samples means that subsequent samples are not taken from the same population. This makes the computation of the sampling distribution dependent on the total number of balls and their type.
Now, your question is whether sampling with replacement or without replacement gives a better estimate of the true population mean of the value on all the balls. The truth is, it depends on how many balls are in the jar and how many samples you take. If you take as many samples without replacement as are possible for the number of balls in the jar (e.g., the jar has 500 balls and you take 100 samples of 5 draws each), certainly that resultant sample mean will be best because you've observed all of the balls. But now suppose there are infinitely many balls in the jar--then it makes absolutely no difference--both will estimate the true population mean equally well, so long as you take the same number of samples in either scenario.
As you might expect, a single sample consisting of 2500 draws without replacement, as opposed to 500 samples with replacement of 5 draws without replacement, will be better at estimating the true population mean if the number of balls in the jar is at least 2500, but finite. But the sampling distribution is much more complicated to express in the non-replacement scenario.