Finding the standard deviation from data grouped by intervals

1k Views Asked by At
Starting Monthly Salary Number of Graduates
1,001 - 1,400 1
1,401 - 1,800 11
1,801 - 2,200 14
2,201 - 2,600 38
2,601 - 3,000 36
Total 100

What I have done, is averaged out each value in Monthly Salary column and continuing from there by calculating the deviation at the end

Calculating all 100 of these values is tedious:

(1001 + 1400) / 2 = 1200.50

(1401 + 1800) / 2 = 1600.50

(1401 + 1800) / 2 = 1600.50

(1401 + 1800) / 2 = 1600.50

Continued for all 100 values to the calculate the deviation for each...

Then calculating the standard deviation with σ=√((Σ(x-µ)^2)/2). This formula once everything has been entered in will look really nasty. Is there an easier (or cleaner) solution to finding the (approximate) standard deviation of the monthly starting salaries above?

2

There are 2 best solutions below

0
On BEST ANSWER

Your grouped data have midpoints $m_i$ with respective frequencies $f_i.$

m = c(12,16,20,24,28)*100
f = c(1,11,14,38,36)

The sample mean is approximately $A =\bar X = \frac 1n\sum_{i=1}^5 f_im_i =2388,$ where $n = \sum_{i=1}^5 f_i = 100.$ Using R as a calculator:

n = sum(f); n
[1] 100
a = sum(f*m)/n; a
[1] 2388

The sample variance $S^2 \approx \sum_{i=1}^5 \frac{1}{n-1}f_i(m_i-\bar X)^2 =166\,319.2$ and the sample standard deviation is $S =\sqrt{S^2} = 407.82.$

v = sum(f*(m-a)^2)/(n-1);  v
[1] 166319.2
s = sqrt(v);  s
[1] 407.8225

If you are using some kind of spreadsheet, there might be a built-in function for finding the mean and variance of a column of $n = 100$ numbers. If so, you could find exact values of the sample mean and standard deviation. (Some information is lost when data are put into groups and summarized.)

 x = rep(m, times=f)  # 'data' reconstructed from m & f
 mean(x);  sd(x)
 [1] 2388
 [1] 407.8225

 cutp=seq(1000, 3000, by=400)
  hist(x, br=cutp, ylim=c(0,45), 
       col="skyblue2", label=T)

enter image description here

2
On

$$\sqrt{ \Sigma_i x_i^2 p(x_i)-(\Sigma_i x_ip(x_i))^2}$$

$x_i $ $p(x_i)$ $xp(x_i)$ $x^2p(x_i) $
1,200.5 0.01 12.01 14,412.00
1,600.5 0.11 176.06 281,776.03
2,000.5 0.14 280.07 560,280.04
2,400.5 0.38 912.19 2,189,712.10
2,800.5 0.36 1,008.18 2,823,408.09
Total 1.00 2,388.50 5,869,588.25

$$\sigma=\sqrt{5,689,588.25-2388.5^2}=405.78$$