Standard Deviation after removing outlier.

2.4k Views Asked by At

I am totally new to statistics. I'm learning the basics.

I came upon this question while solving Erwin Kreyszig's exercise on statistics. The problem is simple. It asks to calculate standard deviation after removing outliers from the dataset.

The dataset is as follows: 1, 2, 3, 4, 10. What I did is, I found out qm = 3. Then $q$l $= \frac{1+2}{2} = 1.5$ and $q$m $= \frac{4+10}{2} = 7$.

Now, $IQR = 7-1.5 = 5.5$ and $1.5*IQR = 8.25$

So, we can say numbers beyond $1.5 - 5.5 = -4$ and $7 + 5.5 = 12.5$ will be an outlier.

Since there is no outlier, I found out the Standard Deviation of the set which is 3.53.

But, the answer provided is 1.29 which is different from the standard deviation of the set.

Can anyone help me what I missed?

Also, I have another question - we can see with plain eyes 10 is an outlier. But it is not detected here - why?

2

There are 2 best solutions below

1
On BEST ANSWER

Well deciding what's an outlier is somewhat of an art so there's only a fuzzy line here. Still it seems like a good procedure might detect 10 here and based on your book's answer, it seems like the procedure they intended for you to use should delete 10.

So let's think about what could have gone wrong here. My guess is you should have picked 2 and 4 as your first and third quantiles instead of 1.5 and 7. The procdure for deciding quartiles is also not set in stone and results can vary wildly for small sets of discrete data. It's also an established procedure to consider the upper and lower quartile to be the medians of (1,2,3) and (3,4,10) rather than (1,2) and (4,10) as you've done. The first method is a little more robust (but tends to be biased inwards), as this example demonstrates.

7
On

Removing 10 gives the set $\{1, 2, 3, 4\}$ with $\overline{x} = \frac{1 + 2 + 3 + 4}{4} = 2.5$. So the standard deviation is $\sqrt{((1 - 2.5)^2 + (2 - 2.5)^2 + (3 - 2.5)^2 + (4 - 2.5)^2))/(4 - 1)} = \sqrt{(2\cdot1.5^2 + 2\cdot0.5^2)/3} = \sqrt{5/3} = 1.29...$.