Confidence intervals for proportions - why isn't the Bessel correction used in estimating the standard deviation?

454 Views Asked by At

When calculating confidence intervals for a population with standard deviation σ unknown, σ is estimated using the sample standard deviation S, which uses the Bessel correction to more closely approximate the real σ.

But suppose the population X is a Bernoulli variable. X being now a binary variable, $ \sum_{i=1}^n (x_i - \bar x)^2 = n\bar x (1 - \bar x) $ (as we can see in an answer to this question). So the formula of the sample standard deviation would be $ S = \sqrt {\frac {n} {n-1} \bar x (1 - \bar x)}$.

But in all resources I've read about confidence intervals for proportions, when the population proportion p is unknown, the standard deviation of the population is estimated by $ \sqrt{\bar p (1-\bar p)} $ . This approximation, however, does not use the Bessel correction. If it were, σ would be approximated by $ \sqrt{\frac {n} {n-1} \bar p (1-\bar p)} $.

I understand that $ \bar p (1-\bar p)$ is a consistent estimator for $p(1-p)$, but wouldn't $\frac {n} {n-1} \bar p (1-\bar p)$ be consistend and unbiased, and thus a better estimator?

1

There are 1 best solutions below

0
On

There may be two distinct underlying questions here:

  • for a Bernoulli random variable, $\bar{x}$ is an unbiased estimator of the population proportion $p$, while $\frac{n}{n-1}\bar{x}(1-\bar{x})$ would be an unbiased estimator of $p(1-p)$. Usually the proportion is the parameter of interest and the other properties of the distribution are based on that

  • there are a surprisingly large number of different ways of producing a confidence interval for the proportion: Wikipedia lists several, though there are more such as the Blyth-Still-Casella interval. The confidence interval method related to the Wald test is widely taught as something like $\bar{x}\pm 1.96 \sqrt{\frac{\bar{x}(1-\bar{x})}{n}}$ but is generally regarded as less satisfactory than the other methods; its results are not unreasonable when $n$ is large and $\bar{x}$ not close to $0$ or $1$, but those are not the interesting cases. In the more interesting cases, concerns such as the discreteness of the binomial distribution and the fact $p$ must be between $0$ and $1$ are more substantial issues than the bias of the variance estimate, and other methods address some of these concerns