Difference between $\frac {\sum_{i=1}^n (Y_i - \bar{Y})^{2}}{n}$ and $\frac {\sum_{i=1}^n (Y_i - \bar{Y})^{2}}{n-1}$

Question

Difference between $\frac {\sum_{i=1}^n (Y_i - \bar{Y})^{2}}{n}$ and $\frac {\sum_{i=1}^n (Y_i - \bar{Y})^{2}}{n-1}$

157 Views Asked by Bumbble Comm At 31 Mar 2026 - 5:59

I came across this question that kind of motivated comparing this $\sigma^2$ estimator $$S^2 = \frac {\sum_{i=1}^n (Y_i - \bar{Y})^{2}}{n-1}$$ with this other $\sigma^2$ estimator $$S'^2 = \frac {\sum_{i=1}^n (Y_i - \bar{Y})^{2}}{n}$$

where $Y_1, \ldots, Y_n$ is a random sample.

First, it asked for $\operatorname{E}(S^2)$ and $\operatorname{E}(S'^2)$, and then it asked to compare $\operatorname{Var}(S^2)$ and $\operatorname{Var}(S'^2)$.

To find the expectations, I feel like it can go two ways here:

One potential way is using that $\frac{(n-1)S^2}{\sigma^2} \sim \chi^2(n-1)$ and somehow deriving a distribution for $S^2$ and then the expectation. But I feel like this won't work for $S'^2$.

The other potential way is to just apply

$$\operatorname{E}(S'^2) = \operatorname{E}\Biggl(\frac {\sum_{i=1}^n (Y_i - \bar{Y})^{2}}{n}\Biggr) = \frac{1}{n} \operatorname{E} \biggl({\sum_{i=1}^n (Y_i - \bar{Y})^{2}}\biggr) = \frac{1}{n} \biggl({\sum_{i=1}^n \operatorname{E} \bigr((Y_i - \bar{Y})^{2}\bigr)}\biggr) = \frac{1}{n} {\sum_{i=1}^n \operatorname{E}(Y_i^2) - 2 \bar{Y} \mu + \bar{Y}^2}$$

If I let $n$ be large, I can simplify it further using $\bar{Y} = \mu$:

$$= \frac{1}{n} {\sum_{i=1}^n \operatorname{E}(Y_i^2) - 2\mu^2 + \mu^2} = \frac{1}{n} {\sum_{i=1}^n \operatorname{E}(Y_i^2) - \mu^2} = \frac{1}{n} {\sum_{i=1}^n \operatorname{Var}(Y_i)} = \frac{1}{n} {\sum_{i=1}^n \sigma^2} = \frac{n\sigma^2}{n} = \sigma^2$$

I am not sure about this derivation though, because it does not really work for $S^2$.

Any help is appreciated.

Original Q&A

There are 4 best solutions below

Bumbble Comm On 16 Apr 2018 - 4:34

The first one, with $(n-1)$ in the denominator, is an unbiased estimator of $\sigma^2$, while the second (with the $n$) is biased, namely $$ \mathbb{E} \left( \frac{1}{n-1}\sum ( Y_i - \bar{Y})^2 \right) = \sigma ^2, $$ and $$ \mathbb{E} \left( \frac{1}{n}\sum ( Y_i - \bar{Y})^2 \right) = \left(1 - \frac{1}{n} \right)\sigma ^2 < \sigma ^2, \quad \forall n\in \mathbb{N} . $$ Detailed derivation is available here: https://en.wikipedia.org/wiki/Bias_of_an_estimator

Bumbble Comm On 16 Apr 2018 - 5:00

Let $D_i=Y_i-\bar{Y}$. In either case, you're interested in $\sum_iE(D_i^2)$. So first observe: $$ E(D_i)=0\implies E(D_i^2)=\operatorname{Var}(D_i). $$ But $D_i=\frac{n-1}{n}Y_i-\frac{1}{n}\sum_{j\neq i}Y_j$. Because $Y_1,\ldots,Y_n$ are mutually independent with the same variance $\sigma^2$, we have: $$ \operatorname{Var}(D_i)=\frac{(n-1)^2}{n^2}\operatorname{Var}(Y_i)+\frac{1}{n^2}\sum_{j\neq i}\operatorname{Var}(Y_j)=\frac{n-1}{n}\sigma^2\implies\sum_iE(D_i^2)=(n-1)\sigma^2. $$ It remains then to note: $$ E(S)=\frac{1}{n-1}\sum_iE(D_i^2)=\sigma^2,\quad E[(S')^2]=\frac{1}{n}\sum_iE(D_i^2)=\frac{n-1}{n}\sigma^2. $$

Bumbble Comm On 16 Apr 2018 - 6:02

$S^2$ is not demonstrably a better estimator than $S'^2.$ It is a mathematical fact that $S^2$ is unbiased and $S'^2$ is biased, but unbiased estimators are not generally better than biased estimators. In particular $S'^2$ has a smaller mean squared error of estimation than does $S^2.$ In particular, this paper gives two examples in which unbiased estimation is abominable, whereas a reasonable biased estimator exists. One is a well known example involving the Poisson distribution; the other is my own. This was published in 2002.

It is not generally taught in theory-of-statistics courses that unbiased estimators are better than biased estimators.

Suppose it is established that $$ \frac{(n-1)S^2}{\sigma^2} = \frac 1 {\sigma^2} \sum_{i=1}^n (Y_i - \bar Y)^2 \sim \chi_{n-1} $$ and that $\operatorname E(\chi^2_{n-1}) = n-1$ and $\operatorname{var}(\chi^2_{n-1}) = 2(n-1).$

Let $\operatorname{SS} = \sum\limits_{i=1}^n (Y_i-\bar Y)^2.$ Let us find the mean square error of estimation of $c\operatorname{SS} = c \sum\limits_{i=1}^n (Y_i-\bar Y)^2.$ \begin{align} \operatorname{mse}(c\operatorname{SS}) & = \operatorname{var}(c\operatorname{SS}) + \big( \operatorname{bias}(c\operatorname{SS}) \big)^2 \\[10pt] & = c^2 \sigma^4 2(n-1) + \left( c\operatorname E(\operatorname{SS}) - \sigma^2 \right)^2 \\[10pt] & = \sigma^4 \left( 2c^2(n-1) + (c(n-1) - 1)^2 \right). \end{align} This is a quadradic function of $c,$ a parabola that opens upward. Complete the square and you see that its lowest point is at $c = 1/(n+1).$ Thus the biased estimator $$ \frac 1 {n+1} \sum_{i=1}^n (Y_i - \bar Y)^2 $$ has a smaller mean squared error than does the biased estimator identified in the question, and that biased estimator has a smaller mean squared error that does the unbiased estimator.

However, mean squared error is not sacred. Goodness of an estimator can reasonable be assessed by other criteria in some circumstances. Thus the examples in the linked paper are better if you're just trying to show that unbiasedness is sometimes a bad thing.

**Bumbble Comm** · Accepted Answer

You might want to consider this;

$\sum_{i=1}^n(Y_i-\overline Y)^2\\=\sum_{i=1}^n(Y_i-\mu+\mu-\overline Y)^2\\=\sum_{i=1}^n[(Y_i-\mu)-(\overline Y-\mu)]^2\\=\sum_{i=1}^n[(Y_i-\mu)^2-2(Y_i-\mu)(\overline Y-\mu)+(\overline Y-\mu)^2]\\=\sum_{i=1}^n(Y_i-\mu)^2-\sum_{i=1}^n2(Y_i-\mu)(\overline Y-\mu)+\sum_{i=1}^n(\overline Y-\mu)^2\\=\sum_{i=1}^n(Y_i-\mu)^2-2(\overline Y-\mu)\sum_{i=1}^n(Y_i-\mu)+(\overline Y-\mu)^2\sum_{i=1}^n1\\=\sum_{i=1}^n(Y_i-\mu)^2-2(\overline Y-\mu)(\sum_{i=1}^nY_i-\sum_{i=1}^n\mu)+n(\overline Y-\mu)^2\\=\sum_{i=1}^n(Y_i-\mu)^2-2(\overline Y-\mu)(n\overline Y-n\mu)+n(\overline Y-\mu)^2 \\=\sum_{i=1}^n(Y_i-\mu)^2-2n(\overline Y-\mu)^2+n(\overline Y-\mu)^2\\=\sum_{i=1}^n(Y_i-\mu)^2-n(\overline Y-\mu)^2$ $(*)$

Now we know that $Var(Y_i)=E[(Y_i-\mu)^2]=\sigma^2$ and $Var(\overline Y)=E[(\overline Y-\mu)^2]=\frac {\sigma^2}{n}$ by definition of variance.

Taking the expected value of $(*)$,

$E[\sum_{i=1}^n(Y_i-\overline Y)]\\=E[\sum_{i=1}^n(Y_i-\mu)^2-n(\overline Y-\mu)^2]\\=E[\sum_{i=1}^n(Y_i-\mu)^2]-E[n(\overline Y-\mu)^2]\\=\sum_{i=1}^nE(Y_i-\mu)^2-nE(\overline Y-\mu)^2\\=n\sigma^2-n.\frac {\sigma^2}{n}\\=(n-1)\sigma^2$

So clearly, $E(S^2)=E[\frac {\sum_{i=1}^n(Y_i-\overline Y)^2}{n-1}]=\frac {(n-1)\sigma^2}{n-1}=\sigma^2$

This explains why the first formula is a better approximation for $\sigma^2$. Had you used the second formula, then the expected value, $E(S^{'2})=\frac {(n-1)\sigma^2}{n}$, which clearly shows that it is an underestimation.

Difference between $\frac {\sum_{i=1}^n (Y_i - \bar{Y})^{2}}{n}$ and $\frac {\sum_{i=1}^n (Y_i - \bar{Y})^{2}}{n-1}$

There are 4 best solutions below

Related Questions in PROBABILITY

Related Questions in PROBABILITY-THEORY

Related Questions in PROBABILITY-DISTRIBUTIONS

Trending Questions

Popular # Hahtags

Popular Questions