Establishing the upper and lower bounds of normal using standard deviation

71.8k Views Asked by At

I understand the concept of standard deviations and z-values, but I'm trying to figure out if standard deviations alone are good for establishing the upper and lower bounds for normal. For example, if I have the following dataset:


$x = 1,4,1,10,112,6,22,7,18,113,1,4,1,10,112,6,22,7,18,113,1,4,1,10,112,6,22,7$

$\mu = 26.82$

$std(x) = 41.16$


I have been establishing the range for normal values like this:


$lowerNormal = 26.82 - 41.16 = -14.78$

$upperNormal = 26.82 + 41.16 = 67.98$


So, when a new value comes along:

$a = 72$

I consider that value to be abnormal, since it is above the $upperNormal$ value.

My question is whether or not this is a statistically sound way for determining whether a value is normal. The real issue I'm seeing is that none of the values in $x$ will ever be negative, so it seems that at least the lower bound is somewhat arbitrary.

Anyways, I apologize if I am missing something simple. I'm learning statistics on my own and it's a little puzzling sometimes. Thanks for your help

4

There are 4 best solutions below

1
On

I believe above is set of random number. Standard deviation is appropriate for the numbers which are dependent such as daily road traffic. Please also consider Correlation for less dependent numbers.

0
On

Your problem is that you are subtracting the full standard deviation from the mean. When you do that, it means that you are multiplying the standard deviation by 2, since you are subtracting $1*std(x)$ and adding $1*std(x)$.

Instead, add/subtract $\frac{1}{2}*std(x)$.

Thus, your new normals are:

$$lowerNormal=26.82−\frac{41.16}{2}=6.24$$ $$upperNormal=26.82+\frac{41.16}{2}=47.4$$

0
On

If a negative number is not a possible/feasible data point than your lower limit is zero in this particular example. It will stay zero until you get more data that shifts your mean higher.

0
On

You should ideally be computing bounds statistically using z score like shown below:

mean +/- [z residual value * (std dev/sqrt(n))]

This will give you the probability range. As far as the z value is within this range, your observation is true or otherwise. Makes sense?