I have a minor problem with a Statistics exercise. I just don't get the same number as in the master solution, so I wondered if there was anything I did not consider:
The German Federal Oce for Motor Vehicles (Kraftfahrt-Bundesamt)
publishes information about newly registered cars. The following
table shows the number of newly registered cars in the period Q1-Q3 2016:
Brand Registrations
Alfa Romeo 3,039
Aston Martin 204
Audi 227,684
Bentley 638
BMW 196,584
Cadillac 236
Chevrolet 595
Citroen 38,186
Dacia 37,752
DS 3,263
Ferrari 589
Fiat 61,788
Ford 183,456
Honda 19,967
Hyundai 81,346
Infiniti 1,864
Iveco 556
Jaguar 6,710
Jeep 11,411
Kia 46,458
Lada 1,229
Lamborghini 206
Land Rover 17,253
Lexus 1,676
Lotus 123
Maserati 1,184
Mazda 49,904
Mercedes 235,828
Mini 33,755
Mitsubishi 28,805
Morgan 62
Nissan 56,412
Opel 185,738
Peugeot 42,629
Porsche 23,053
Renault 88,072
Rolls-Royce 122
Seat 71,290
Skoda 140,772
Smart 26,442
Ssangyong 2,692
Subaru 5,408
Suzuki 23,791
Tesla 1,415
Toyota 51,926
Volvo 28,050
VW 510,003
Others 5,617
a) Determine the empirical density and empirical distribution
function using classes of size 50,000.
b) Sketch the empirical density function.
c) Using class means, calculate the Sample Mean and Sample
Standard Deviation.
d) What is the median number of registrations?
e) Calculate the z-score for VW and give an interpretation.
f) Determine and sketch the Lorenz curve, and calculate
the Gini coefficient (using the classes)
My problem is regarding c). The class means are all clear and the Sample Mean also (as µ=53245.5). Strangely, in the master solution , the Sample Mean is not calculated using the class means (although the text requires it). But here is the problem: The master solution says the Sample Standard Deviation is calculated as 90400.1, although I calculate always 92496.97826.
Is it my error or an error in the solution? What do I need the Class Means for, then?
Thanks a lot!
Here are some computations that may be helpful.
Below are some results I get when I put your numbers $x_1, \dots, x_{48}$ of registrations into R statistical software. (You should proofread to make sure I transferred them accurately.)
A histogram of the data with the cutpoints $b_j = 0, 5000, 10000, \dots, 515000$ (spaced 5000 apart as required) and $k$ midpoints $m_j = 2500, \cdots, 512500$ is as follows:
A frequently-used method of approximating the sample mean $\bar x = \frac 1{48} \sum_{i=1}^{48} x_i$ based on a histogram is to use bin frequencies $f_j$ and midpoints $m_j$ read from the histogram to get $\bar x \approx \frac 1n \sum_{j=1}^k f_jm_j,$ where $n = \sum_j f_j.$
In your case you have the original data and you could use the actual means $m_j^\prime$ of the bins instead of the bin midpoints $m_j$ to get a more accurate answer.
You could estimate the sample variance as $s^2 = \frac{1}{n-1}\sum_{j=1}^k f_j(m_j - \bar x)^2$ and take its square root to get the sample standard deviation (SD). It would be easier to find the exact SD from the original data than from the binned data. Perhaps your text has a formula you are supposed to use to get the sample SD.
The z-score for VW is $Z = \frac{510,003 - \bar x}{s} = 4.94.$ this means it is almost five standard deviations above the mean. In many datasets there are few if any observations more than three standard deviations away from the mean (in either direction), so the number of registrations for VW is unusually large and might be called an 'outlier'.