Questions concerning the power of the standard deviation

49 Views Asked by At

The formula for standard deviation is

$$S_x = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar{x})^2}$$

I learn that $68$% of the values fall within $S_x$, $95$% of the values fall within $2S_x$, and $99.7$% of the values fall within $3S_x$.

My question is that why is it the second power? Can it also be $(x_i-\bar{x})^4$, or any other even powers?

What is the reason behind the second power? Is it just easy to use? Or is here any other meaning to it?

3

There are 3 best solutions below

0
On BEST ANSWER

Some reasons to define the variance and standard deviation the way they're defined:

With this definition, the mean minimizes the variance, meaning: If we compute the mean square deviation from some value $\mu$, it's minimal if $\mu$ is the mean:

\begin{eqnarray*} f(\mu)&=&\sum_i(x_i-\mu)^2\;,\\ f'(\mu)&=&-2\sum_i(x_i-\mu)\;,\\ f'(\mu)=0&\Leftrightarrow&\mu=\frac1n\sum_ix_i\;. \end{eqnarray*}

This doesn't work the same way with higher even powers, e.g.:

\begin{eqnarray*} f(\mu)&=&\sum_i(x_i-\mu)^4\;,\\ f'(\mu)&=&-4\sum_i(x_i-\mu)^3\;,\\ f'(\mu)=0&\Leftrightarrow&\sum_i(x_i-\mu)^3=0\;, \end{eqnarray*}

a cubic equation for $\mu$ without a natural interpretation. Thus, the median minimizes the mean absolute deviation, and the mean minimizes the mean square deviation, whereas the number minimizing the mean quartic deviation isn't known to have any nice properties.

The variance of independent random variables is additive:

\begin{eqnarray*} \mathsf{Var}(X+Y)&=&\mathsf E\left[(x+y-\bar x-\bar y)^2\right]\\ &=& \mathsf E\left[(x-\bar x)^2\right]+\mathsf E\left[(y-\bar y)^2\right]+2\mathsf E\left[xy-\bar xy-x\bar y+\bar x\bar y\right] \\ &=& \mathsf E\left[(x-\bar x)^2\right]+\mathsf E\left[(y-\bar y)^2\right]+2(\bar x\bar y-\bar x\bar y-\bar x\bar y+\bar x\bar y) \\ &=& \mathsf E\left[(x-\bar x)^2\right]+\mathsf E\left[(y-\bar y)^2\right] \\ &=& \mathsf{Var}(X)+\mathsf{Var}(Y)\;. \end{eqnarray*}

This, too, wouldn't work with higher even powers. This sort of additivity is at the heart of important theorems like the central limit theorem.

0
On

Standard deviation is one way to measure the spread of some data. You could certainly introduce another measure of spread that used 4th powers and took the fourth root. It would have different properties, and might not be useful.

For example with data that is normally distributed, the property you cite about 68% and 95% would not hold with such a different measure of spread.

There are genuine reasons to work with a measure of spread that involves squaring the residuals like standard deviation/error does. I don't know that I could be successful at explaining them in a short SE post. Maybe someone else will though.

0
On

Those statement about $68\%$, $95\%,$ and $99.7\%$ apply to the normal distribution, but certainly do not apply to all distributions.

Defining the variance by using $n-1$ in the denominator, where $n$ is the sample size, is done only when using the sample variance to estimate the population variance or otherwise drawing inferences about the population by using a random sample. The population variance is $\operatorname E((X-\mu)^2)$ where $\mu=\operatorname E(X),$ and if the population consists of $n$ equally probablye outcomes, then the standard deviation is given by a formula that looks like what you wrote except that it has $n$ where you have $n-1.$

The reason the second power is used in measuring dispersion is that if $X_1,\ldots,X_n$ are independent, then $$ \operatorname{var}(X_1+\cdots+X_n) = \operatorname{var}(X_1)+\cdots + \operatorname{var}(X_n). $$ You need that whenever you apply the central limit theorem.