The relationship between sample variance and proportion variance?

Question

The relationship between sample variance and proportion variance?

1.2k Views Asked by Bumbble Comm At 02 Apr 2026 - 7:58

I'm trying to see the relationship between the sample variance equation

$\sum(X_i- \bar X)^2/(n-1)$ and the variance estimate, $\bar X(1-\bar X),$ in case of binary samples.

I wonder if the outputs are the same, or if not, what is the relationship between the two??

I'm trying to prove their relationship but it's quite challenging to me..

Please help!

Sigma(Xi-Xbar)/(n-1) Xbar(1-Xbar)

Original Q&A

There are 2 best solutions below

**Bumbble Comm** · Answer 1 · 2015-12-14 03:03:33

I suppose your question is whether the two formulas give the same answer for binary data. Here is an example to illustrate that they are almost the same, but not exactly.

Suppose I have a sample of a thousand zeros and ones in which there are 283 ones. Then $\bar X = 283/1000 = 0.283.$ Thus, $\bar X(1-\bar X) = 0.283(1 - 0.283) = 0.202911.$

An alternate general formula for the sample variance of values $X_i$ is

$$S^2 = \frac{\sum_{i=1}^n X_i^2 - n \bar X^2}{n-1}.$$

In a binary sample $\sum_{i=1}^n X_i^2 = \sum_{i=1}^n X_i$, because $0^2 = 0$ and $1^2 = 1.$

Thus, the general formula gives $S^2 = \frac{283 - 1000(.283)^2}{999} = 0.2031141.$ If (as in the Comment by @A.S) the denominator were $n = 1000$ instead of $n-1=999,$ this would simplify to $$S^2 = 0.283 - 0.283^2 = 0.283(1 = 0.283) = \bar X(1- \bar X).$$

The formula for the population variance is often written with the population size $n$ in the denominator.

**Bumbble Comm** · Answer 2 · 2015-12-14 04:20:10

The first quantity is the standard variance estimator that is unbiased for i.i.d samples from any distribution.

The second quantity is a simplified formula (the simplification being valid only for 0-1 binary data) for calculating exactly, not estimating, the variance of the sample.

Using the second instead of the first to estimate the distribution variance will, on average, lead to slight underestimates. This is equivalent to the use of $n$ instead of $n-1$ in the denominator of the estimator.

The relationship between sample variance and proportion variance?

There are 2 best solutions below

Related Questions in STATISTICS

Related Questions in VARIANCE

Trending Questions

Popular # Hahtags

Popular Questions