When plotting a bell curve from an array of values, is it possible that +/- 2 standard deviations from the mean can fall outside the range of values?

349 Views Asked by At

First, this is not homework, it is actually for work. It's been a couple of years since I've done stats and need some help! I've googled for this problem but was unavailable to find any resources that could help answer my question.

I have 25 values:

11.5
11.6
11.9
12.2
12.4
12.4
12.5
12.5
12.5
12.8
12.8
12.9
13.1
13.3
13.5
13.5
13.7
13.7
13.8
13.9
13.9
14
14.3
14.5
15

From here, I calculate the mean and from that, the variance and then the standard deviation:

The variance formula and my variance calculations:

$$ \sigma^{2} =\frac{\sum_{i=1}^{n}(x_{i}-\mu )^{2}}{n}=\frac{\sum_{i=1}^{25}(x_{i}-13.128)^{2}}{25}=0.7996159999999999 $$

Of course, standard deviation is simply the square root of variance:

$$ \sigma =\sqrt{0.7996159999999999}=0.8942125027083886 $$

Here's where I feel like I'm messing up:

One standard deviation less than the mean:

$$ -\sigma + \mu = -0.8942125027083886 + 13.128 = 12.2337874972916114 $$

Two standard deviations less than the mean:

$$ -2\sigma + \mu = -2*0.8942125027083886 + 13.128 = 11.3395749945832228 $$

This value, 11.3395749945832228, falls below the smallest value in the array, 11.5.


How is this possible? Where am I messing up my calculations? Thank you for any and all help! I really appreciate it.

1

There are 1 best solutions below

1
On BEST ANSWER

EDIT: if there is some expensive decision riding on this, consider trying to get funding for an hour of consulting from a graduate student or professor in statistics there. These issues are always about interpretation, and need the hand of a master. The last time I saw something like this, it was a medical doctor doing a study on lung cancer or the like, but she was in an academic department, and her university statistics department had a very clear setup and fees for consulting to other university departments, all on their website.

ORIGINAL: Let there be four data points, $$ -1,-1,1,1. $$ The mean is $0.$ Sum of squares (after subtracting $0$) is $4,$ number of data points is $4,$ so variance and standard deviation are $1.$ Two standard deviations misses all the data points.

The only guaranteed thing is Chebyshev's inequality, which can be used either by assuming a reasonable governing probability distribution or by taking the set of data points as defining the distribution, which is what you are doing.

Very similar to my first example: take some large number $100,$ place one data point at $0,$ but then place $100$ data points at $1$ and $100$ data points at $-1.$ The standard deviation comes out just under $1,$ mean $0,$ so all but one data point lie outside a single standard deviation, while everything lies inside $1.0025$ deviations. Right, $\sigma^2 = 200/201 \approx 0.995, \; \; \sigma \approx 0.9975,$ reciprocal $\approx1.002496883$ gets us to exactly $1,$ so everything is inside $1.0025$ deviations.