Estimator of $\mu - \mu^2$ when sampling without replacement

Question

Estimator of $\mu - \mu^2$ when sampling without replacement

360 Views Asked by Bumbble Comm At 31 Mar 2026 - 7:56

Question

Consider a population of known size $N$, from which we sample $n$ individuals without replacement and measure their trait value $X_i$. All traits values are bounded between $[0,1]$. Let $\mu = \frac{\sum_i^N x_i}{N}$ be the average trait value in the population. If it makes things easier, I am happy to consider that $X$ is boolean ($0$ or $1$).

What is the unbiased estimator for $\mu - \mu^2$?

Attempt

$$ \begin{align} \mu - \mu^2 &= \operatorname{E}[\bar X] - \operatorname{E}[\bar X]^2\\ &= \operatorname{E}[\bar X] - \operatorname{E}[\bar X]^2 - \operatorname{var}[\bar X] + \operatorname{var}[\bar X] \\ &= \operatorname{E}[\bar X] - \operatorname{E}[\bar X]^2 - \operatorname{var}[\bar X] + \frac{1}{n} \sigma^2 \\ &= \operatorname{E}[\bar X] - \operatorname{E}[\bar X^2] + \frac{1}{n} \operatorname{E}[\hat \sigma^2], \end{align}$$

where $\hat \sigma^2$ is the estimator for the variance. Looking at this post,

$$\hat\sigma^2 = \frac{N-1}{N(n-1)} \sum_{i=1}^n (X_i-\bar X)^2$$

Hence,

$$\mu - \mu^2 = \operatorname{E}\left[ \bar X - \bar X^2 + \frac{1}{n} \frac{N-1}{N(n-1)} \sum_{i=1}^n (X_i-\bar X)^2\right] $$

Test

Looked good to me so I tried it out (in R)

nbtrials = 5000      # number of independent samples
N = 20               # Number of individuals in the population
pop = rep(0:1, N/2)  # Create a boolean population with $\mu = 0.5$
n = 17               # Sample size

out = numeric(nbtrials)  # out will collect the estimates of $\mu - \mu^2$
for (trial in 1:nbtrials)
{
    s = sample(pop,size=n, replace=FALSE)   # Take a sample without replacement
    xbar = sum(s) / n                       # Compute the average frequency in the sample
    out[trial] = xbar - xbar^2 + 1/n * (N - 1) / (N*(n-1)) * sum((s - mean(s))^2)   # Compute the estimator
}

# Now compare the population with the average of estimations taken from the samples
trueMean=sum(pop) / length(pop)
print(paste("True value  = ",trueMean *(1-trueMean)))
print(paste("Average estimated value = ",mean(out)))

which outputs something like

"True value  =  0.25"
"Average estimated value =  0.262343771626298"

showing a clear systematic overestimation. Where did I go wrong?

Where does the mistake lie?

From @EinarRødland's comment, it seems likely that my mistake is in the equation $var[\bar X] = \frac{\sigma^2}{n}$, which is wrong due to the fact that sampling occurs without replacement from a finite population.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

The variance of the sampling distribution is not equal to the sample variance over the sample size when sampling is with replacement. From ma.utexas, the variance of the sampling distribution when sampling without replacement is

$$\operatorname{var}[\bar X] = \frac{s^2}{n} \left(1-\frac {n-1} {N-1} \right) = \frac{\sum (X_i - \bar X)^2}{n(n-1)} \left(1-\frac {n-1} {N-1} \right)$$

Therefore,

$$\mu - \mu^2 = \operatorname E\left[\bar X - \bar X^2 + \frac{\sum (X_i - \bar X)^2}{n(n-1)} \left(1-\frac {n-1} {N-1} \right)\right]$$

Thanks to @EinarRødland for pointing out where the mistake was.

Estimator of $\mu - \mu^2$ when sampling without replacement

Question

Attempt

Test

Where does the mistake lie?

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in ESTIMATION

Related Questions in SAMPLING

Related Questions in VARIANCE

Trending Questions

Popular # Hahtags

Popular Questions