Problem with Sample Standard Deviation

Question

Problem with Sample Standard Deviation

72 Views Asked by Bumbble Comm At 01 Apr 2026 - 11:02

I have a minor problem with a Statistics exercise. I just don't get the same number as in the master solution, so I wondered if there was anything I did not consider:

The German Federal Oce for Motor Vehicles (Kraftfahrt-Bundesamt) 
publishes information about newly registered cars. The following 
table shows the number of newly registered cars in the period Q1-Q3 2016:

Brand         Registrations
Alfa Romeo    3,039
Aston Martin  204
Audi          227,684
Bentley       638
BMW           196,584
Cadillac      236
Chevrolet     595
Citroen       38,186
Dacia         37,752
DS            3,263
Ferrari       589
Fiat          61,788
Ford          183,456
Honda         19,967
Hyundai       81,346
Infiniti      1,864
Iveco         556
Jaguar        6,710
Jeep          11,411
Kia           46,458
Lada          1,229
Lamborghini   206
Land Rover    17,253
Lexus         1,676
Lotus         123
Maserati      1,184
Mazda         49,904
Mercedes      235,828
Mini          33,755
Mitsubishi    28,805
Morgan        62
Nissan        56,412
Opel          185,738
Peugeot       42,629
Porsche       23,053
Renault       88,072
Rolls-Royce   122
Seat          71,290
Skoda         140,772
Smart         26,442
Ssangyong     2,692
Subaru        5,408
Suzuki        23,791
Tesla         1,415
Toyota        51,926
Volvo         28,050
VW            510,003
Others        5,617

a) Determine the empirical density and empirical distribution 
   function using classes of size 50,000.
b) Sketch the empirical density function.
c) Using class means, calculate the Sample Mean and Sample 
   Standard Deviation.
d) What is the median number of registrations?
e) Calculate the z-score for VW and give an interpretation.
f) Determine and sketch the Lorenz curve, and calculate 
   the Gini coefficient (using the classes)

My problem is regarding c). The class means are all clear and the Sample Mean also (as µ=53245.5). Strangely, in the master solution , the Sample Mean is not calculated using the class means (although the text requires it). But here is the problem: The master solution says the Sample Standard Deviation is calculated as 90400.1, although I calculate always 92496.97826.

Is it my error or an error in the solution? What do I need the Class Means for, then?

Thanks a lot!

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2017-12-29 07:10:18

Here are some computations that may be helpful.

Below are some results I get when I put your numbers $x_1, \dots, x_{48}$ of registrations into R statistical software. (You should proofread to make sure I transferred them accurately.)

x = c(3039, 204, 227684, 638, 196584, 236, 595, 38186, 37752, 
     3263, 589, 61788, 183456, 19967, 81346, 1864, 556, 6710, 
     11411, 46458, 1229, 206, 17253, 1676, 123, 1184, 49904, 
     235828, 33755, 28805, 62, 56412, 185738, 42629, 23053, 
     88072, 122, 71290, 140772, 26442, 2692, 5408, 23791, 1415, 
     51926, 28050, 510003, 5617)

 length(x); mean(x); median(x); sd(x)
 # 48            # sample size
 # 53245.48      # sample mean
 # 21510         # sample median
 # 92496.98      # sample SD

A histogram of the data with the cutpoints $b_j = 0, 5000, 10000, \dots, 515000$ (spaced 5000 apart as required) and $k$ midpoints $m_j = 2500, \cdots, 512500$ is as follows:

A frequently-used method of approximating the sample mean $\bar x = \frac 1{48} \sum_{i=1}^{48} x_i$ based on a histogram is to use bin frequencies $f_j$ and midpoints $m_j$ read from the histogram to get $\bar x \approx \frac 1n \sum_{j=1}^k f_jm_j,$ where $n = \sum_j f_j.$

In your case you have the original data and you could use the actual means $m_j^\prime$ of the bins instead of the bin midpoints $m_j$ to get a more accurate answer.

You could estimate the sample variance as $s^2 = \frac{1}{n-1}\sum_{j=1}^k f_j(m_j - \bar x)^2$ and take its square root to get the sample standard deviation (SD). It would be easier to find the exact SD from the original data than from the binned data. Perhaps your text has a formula you are supposed to use to get the sample SD.

The z-score for VW is $Z = \frac{510,003 - \bar x}{s} = 4.94.$ this means it is almost five standard deviations above the mean. In many datasets there are few if any observations more than three standard deviations away from the mean (in either direction), so the number of registrations for VW is unusually large and might be called an 'outlier'.

 > (510003-mean(x))/sd(x)
 [1] 4.93808

Problem with Sample Standard Deviation

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in STANDARD-DEVIATION

Trending Questions

Popular # Hahtags

Popular Questions