How is the Standard Error and the Unbiased Estimate of the Variance Related?

2.8k Views Asked by At

I am going through a statistics textbook and there are two similar formula's that I cannot seem to grasp, one under the "Sampling Error" section and the other under the "Unbiased Estimator" section.

One says that ${ \sigma }_{ x }=\frac { \sigma }{ \sqrt { n } }$. The other one says that ${ { S }_{ n-1 } }^{ 2 }=\left( \frac { n }{ n-1 } \right) { { S }_{ n } }^{ 2 }$. Is ${ \sigma }_{ x }$ somehow related to ${ { S }_{ n-1 } }^{ 2 }$ or are they totally unrelated concepts?

1

There are 1 best solutions below

1
On BEST ANSWER

The short answer is that $$E[S_{n-1}^2] = n\sigma_{x}^2$$


A slightly longer answer involves considering a sample $X_1,X_2,...,X_n$ drawn from a population with mean $\mu$ and variance $\sigma^2$. In most sampling situations, both $\mu$ and $\sigma^2$ are unknown and are typically estimated by $\bar X$ and $S_{n-1}^2$, respectively where $$\bar X = \frac1n \sum_1^nX_i$$and $$S_{n-1}^2 = \frac 1{n-1} \sum_1^n(X_i - \bar X)^2$$ $\bar X$ and $S_{n-1}^2$ are respectively chosen as estimators of $\mu$ and $\sigma^2$ because they are unbiased - that is they have expected values equal to the things that they are estimating. In other words $E[\bar X] = \mu$ and $E[S_{n-1}^2] = \sigma^2$

Although $\bar X$ (the sample mean) is an unbiased estimator of $\mu$, any estimate derived from a sample mean is still only an estimate and if a different sample had been drawn a different estimate would have been obtained. Because of this, it is reasonable to want to know how much variability there is in estimates based on $\bar X$ and one measure of this is provided by $Var[\bar X]$, the variance of $\bar X$, which is given by$$Var[\bar X] = \frac{\sigma^2}n$$

The left hand side of this expression is, in the notation of your textbook, simply $\sigma_x^2$ so, taking square roots, the expression leads directly to your text's result that $\sigma_x = \sigma/\sqrt{n}$. Alternatively this can be re-written as $n\sigma_x^2 = \sigma^2$ and, as already noted, this latter has a right hand side equal to $E[S_{n-1}^2]$ leading to the result stated as the short answer.

To answer your question in words $\sigma_x$ and $S_{n-1}^2$ are related and they are related through the unknown variance $\sigma^2$.


Although not directly related to your question, you also mention $S_n^2$. This is given by $$S_n^2 = \frac 1n \sum_1^n(X_i - \bar X)^2$$ and it should be fairly apparent how a comparison of the expressions for $S_{n-1}^2$ and $S_n^2$ leads to your text's result that $S_{n-1}^2 = \frac n{n-1}S_n^2$.

Intuitively, perhaps, $S_n^2$ might seem to be a better choice as an estimator of $\sigma^2$ than $S_{n-1}^2$ which, with its $n-1$ divisor seems, at first sight, a bit counter-intuitive. $S_{n-1}^2$ turns out to be the better choice because $\mu$ is unknown. If $\mu$ were known, it would be possible to use sample statistics based on $\sum(X_i - \mu)^2$. However, because (generally) $\mu$ is unknown, it is replaced by $\bar X$ and sample statistics for estimating $\sigma^2$ are instead based on $\sum(X_i - \bar X)^2$. The expected values of these two statistics are subtly different.For the former $E[\sum_1^n(X_i - \mu)^2] = n\sigma^2$ whilst for the latter $E[\sum_1^n(X_i - \bar X)^2] =(n-1)\sigma^2$.

It is the latter result that leads to $S_{n-1}^2$ being an unbiased estimator for the unknown variance $\sigma^2$ (and $S_n^2$ being biased). So the choice of the statistic with the $n-1$ divisor (i.e. $S_{n-1}^2$) being favoured over the $n$ divisor version (i.e. $S_n^2$) depends on $\mu$ being unknown. If $\mu$ were known then it would be possible to construct alternative statistics, say $T_{n-1}^2$ and $T_n^2$ where $$T_{n-1}^2 = \frac 1{n-1}\sum_1^n(X_i - \mu)^2$$ and $$T_n^2 = \frac 1n\sum_1^n(X_i - \mu)^2$$ and, in this case, it would be $T_n^2$ (the statistic with a divisor of n) that would provide the unbiased estimator of $\sigma^2$ and so would be the preferred choice.

Calculating the various expected values noted or implied in the sections above is not onerous but it does require some knowledge of the various "rules" for expected values such as $E[cX] = cE[X]$, $E[X + Y] = E[X] + E[Y]$, $E[XY] = E[X]E[Y]$ (for independent $X$,$Y$) and, of course, $Var[X] = E[(X - E[X])^2]$.