Let's consider a population of boolean values [0,1]. In the population, the mean (or frequency of 1) is $\mu$. We take a sample of size $n$, which mean $\bar x$ is
$$\bar x = \frac{\sum_i^n x_i}{n}$$
and sample variance
$$s^2 = \frac{\sum_i^n (x_i - \bar x)^2}{n-1}$$
I would like to estimate the parameter $D=\mu (1-\mu)$. It appears from the below small simulations (coded in R) that the unbiased estimator of $D$ is
$$\hat D = \bar x(1-\bar x) + \frac{s^2}{n}$$
Can you help me to figure out why this is true?
nbtrials = 5000
popSize = 200
pop = 0:1
sampleSize = 10
out = numeric(nbtrials)
for (trial in 1:nbtrials)
{
s = sample(pop,size=sampleSize, replace=TRUE)
xbar = sum(s) / sampleSize
out[trial] = xbar * (1-xbar) + var(s) / sampleSize
}
xbar=sum(pop) / length(pop)
print(paste("True value of D = ",xbar *(1-xbar)))
print(paste("Average estimated value of D = ",mean(out)))
Several facts you need to use: $$ E[X_1] = \mu, Var[X_1] = \mu(1 - \mu), E[\bar{X}] = E[X_1], Var[\bar{X}] = \frac {1} {n} Var[X_1], E[S^2] = Var[X_1]$$
Then we have $$ \begin{align} E[\hat{D}] &= E[\bar{X}] - E[\bar{X}^2]+\frac {1} {n} E[S^2] \\ &= \mu - Var[\bar{X}]-E[\bar{X}]^2 + \frac {1} {n}\mu(1 - \mu) \\ &= \mu - \frac {1} {n} \mu(1-\mu)-\mu^2+\frac {1} {n}\mu(1 - \mu) \\ & = \mu - \mu^2 \\ & = \mu(1 - \mu) \end{align}$$
Therefore this is an unbiased estimator of $\mu(1 - \mu)$