difference between $S^2$, $\sigma_x^2$. and $\sigma^2$?

4.8k Views Asked by At

On $(5),$ $(6),$ and $(7),$ what's the difference between $S^2$ and $\sigma_x^2$?

Also, why does:

$$\sigma_X^2 = \sum \limits_{i=1}^{n} \frac{1}{n^2} \sigma^2 = \frac{\sigma^2}{n}$$

?

I'm assuming $\sigma^2$ is the population variance.

It seems like S is a random variable since I can take the expectation of it, but, $\sigma_x$ is the same thing except not a random variable?


Let $(X_1, \cdots, X_n)$ be a random sample of $X$ having unknown mean $\mu$, and variance $\sigma_x^2$

\begin{align} S^2 &= \frac{1}{n} \sum (X_i - \bar{X})^2 \tag{0}\\[4ex] E[S^2] &= E\Big[\frac{1}{n} \sum (X_i - \bar{X})^2 \Big]\tag{1}\\[2ex] &= E\Bigg[\frac{1}{n} \sum \limits_{i=1}^{n}\Big[~[(X_i - \mu)-(\bar{X}-\mu)]^2~\Bigg]\tag{2}\\[2ex] &= E\Bigg[ \frac{1}{n} \sum \limits_{i=1}^{n} \Big[~(X_i-\mu)^2-2(X_i-\mu)(\bar{X}-\mu)+(\bar{X}-\mu)^2~\Big] ~\Bigg]\tag{3}\\[2ex] &= E\Bigg[~\frac{1}{n} \Big[~\sum \limits_{i=1}^{n} (X_i - \mu)^2 - n(\bar{X} - \mu)^2 \Big]~\Bigg]\tag{4}\\[2ex] &= \frac{1}{n} \sum \limits_{i=1}^{n} E\big[(X_i-\mu)^2\big] - E\big[(\bar{X}-\mu)^2\big]\tag{5}\\[2ex] &= \sigma^2 - \sigma_X^2\tag{6}\\[2ex] &= \sigma^2 - \frac{1}{n}\sigma^2\tag{7}\\[2ex] &= \frac{n-1}{n}\sigma^2\tag{8} \end{align}

Equation (8) shows that $S^2$ is a biased estimator of $\sigma^2$

2

There are 2 best solutions below

0
On

$\sigma$ is the population standard deviation of the random variable $X$.

$X_i$ represents the value of the i-th sample. If you have $n$ different such samples, the standard deviation of the $n$ samples is the random variable $S$.

The average of a $n$ samples is the random variable $\bar{X}$. This variable will have a standard deviation $\sigma_X$.

0
On

You may find your answers both in the sample variance subheading of the variance entry in Wikipedia, or possibly in this answer in Stack Exchange CV.

Difference between $\sigma_X^2$ and $S^2:$

The standard deviation of the sampling distribution of the sample mean (standard error of the mean) is

$$\sigma_{\bar X}^2 = \frac 1 n E\left( \bar X - \mu \right)^2$$

and it carries a bar on top of the subscripted random variable $\sigma_{\color{red}{\bar X}}^2.$

The sample variance is

$$\sigma_X^2=\frac 1 n \sum_{i=1}^n (X_i - \bar X)^2$$

while the unbiased sample variance (Bassel's correction) is

$$S^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar X)^2$$

although the $S^2$ notation applied to both of the above is most common.

Why $\sigma_X^2 = \sum \limits_{i=1}^{n} \frac{1}{n^2} \sigma^2 = \frac{\sigma^2}{n}$?

Probably, the right expression with the above bar on top of the $X$ should be:

$$\sigma_{\bar X}^2 = \frac{\sigma^2}{n}$$

with $\sigma^2$ representing the population variance.

It seems like $S$ is a random variable since I can take the expectation of it, but, $\sigma_x$ is the same thing except not a random variable?

A statistic is an observable random variable - a quantity computed from a sample. Both would be random variables.


Re-stating the equations in the OP with the caveats above, and going along with symbols in the OP which expresses $\sigma_X^2$ as $S^2,$

$$ \small \begin{align} \color{red}{\sigma_X^2} (\text{or }S^2) &= \frac{1}{n} \sum (X_i - \bar{X})^2 \tag{0}\\[4ex] E[\color{red}{\sigma_X^2}] &= E\Big[\frac{1}{n} \sum (X_i - \bar{X})^2 \Big]\tag{1}\\[2ex] &= E\Bigg[\frac{1}{n} \sum \limits_{i=1}^{n}\Big[~[(X_i - \mu)-(\bar{X}-\mu)]^2~\Bigg]\tag{2}\\[2ex] &= E\Bigg[ \frac{1}{n} \sum \limits_{i=1}^{n} \Big[~(X_i-\mu)^2-2(X_i-\mu)(\bar{X}-\mu)+(\bar{X}-\mu)^2~\Big] ~\Bigg]\tag{3}\\[2ex] &= E\Bigg[~\frac{1}{n} \Big[~\sum \limits_{i=1}^{n} (X_i - \mu)^2 - n(\bar{X} - \mu)^2 \Big]~\Bigg]\tag{4}\\[2ex] &= \frac{1}{n} \sum \limits_{i=1}^{n} E\big[(X_i-\mu)^2\big] - E\big[(\bar{X}-\mu)^2\big]\tag{5}\\[2ex] &= \sigma^2 - \sigma_{\color{red}{\bar X}}^2\tag{6}\\[2ex] &= \sigma^2 - \frac{1}{n}\sigma^2\tag{7}\\[2ex] &= \frac{n-1}{n}\sigma^2\tag{8} \end{align} $$