Understanding the value of standard deviation

137 Views Asked by At

I have two datasets $\{10,10,2,2\}$ and $\{13,7,0,4\}$. Now, when I compute standard deviation for both the sets, I get $4$ and $4.74$ respectively. My question: what is the significance of $4.74$ or $4$? I understand the basic definition of standard deviation i.e. deviation from the mean etc; but how to understand the value $4.74$?

1

There are 1 best solutions below

3
On

If you assume that the sequence has a normal distribution, with mean value $\mu$ and standard deviation $\sigma$, then the probability of getting a sample in the interval $\left[\mu-\sigma, \mu+\sigma\right]$ will be roughly equal to 68.27%.

So when you have measured something, which has a normal distribution, for a finite time and calculate the mean and standard deviation, then you can use that to make a prediction for future measurements.

In order to test whether a sequence has a normal distribution you can look at its skewness $\nu$ and kurtosis $\kappa$. The skewness tells you something about how asymmetric the distribution is, while the kurtosis tells you something about how likely it is to have outliers which lie far away from the mean relative to the standard deviation. In the literature there multiple expressions for these, but I will give you one which is used in the statistical software Statgraphics,

$$ \nu = \frac{1}{(n-1)\sigma^3} \sum_{i=1}^n (x_i - \mu)^3, $$

$$ \kappa = \frac{1}{(n-1)\sigma^4} \sum_{i=1}^n (x_i - \mu)^4. $$

When the absolute value of $\nu$ is bigger than 2 then that is an indication that the distribution has a significant deviation from Gaussian. For a normal distribution the value of $\kappa$ is 3, for larger values then the distribution is more outlier-prone and for smaller values then the distribution is less outlier-prone.