Fitting a Normal Distribution to a set of Data

17.5k Views Asked by At

Problem: Fit a normal distribution to the following data:
x f
462 98
480 75
498 56
516 42
534 30
552 21
570 15
588 11
606 6
624 2
Answer:
\begin{eqnarray*} n &=& 98 + 75 + 56 + 42 + 30 + 21 + 15 + 11 + 6 + 2 \\ n &=& 356 \\ u &=& \frac{98(462) + 75(480) + 56(498) + 42(516) + 30(534) + 21(552) + 15(570) + 11(588) } { 356 } \\ &&+ \frac{6(606) + 2(624)}{356} \\ u &=& \frac{ 109164 + 42(516) + 30(534) + 21(552) + 15(570) + 11(588) } { 356 } \\ &&+ \frac{6(606) + 2(624)}{356} \\ u &=& \frac{ 178350 } { 356 } \\ u &=& 500.98315 \\ \end{eqnarray*} Now we need to find the variance. \begin{eqnarray*} \sigma^2 &=& E(x^2) - u^2 \\ E(x^2) &=& \frac{98(462)^2 + 75(480)^2 + 56(498)^2 + 42(516)^2 + 30(534)^2 + 21(552)^2}{356} \\ &+& \frac{ 15(570)^2 + 11(588)^2 + 6(606)^2 + 2(624)^2)}{356} \\ E(x^2) &=& \frac{98(462)^2 + 75(480)^2 + 56(498)^2 + 42(516)^2 + 30(534)^2 + 21(552)^2}{356} \\ &+& \frac{ 11658852 }{356} \\ E(x^2) &=& \frac{98(462)^2 + 75(480)^2 + 56(498)^2 + 42(516)^2 + 30(534)^2 + 21(552)^2 + 11658852}{356} \\ % E(x^2) &=& \frac{ 52085736 + 42(516)^2 + 30(534)^2 + 21(552)^2 + 11658852}{356} \\ E(x^2) &=& \frac{ 71823168 + 21(552)^2 + 11658852}{356} \\ % E(x^2) &=& 252474.17 \\ % \sigma^2 &=& 252474.17 - (500.98315)^2 = 1490.0534 \\ \sigma &=& 38.60121 \\ \end{eqnarray*} So I conclude we have a normal distribution with mean $500.98315$ and standard deviation of $38.60121$.

However, the book's answer is:

Expected frequencies are $1.7$, $5.5$, $12.0$, $15.9$, $13.7$, $7.6$, $2.7$ and $0.6$ respectively.

I do not understand the book's answer.

Bob

1

There are 1 best solutions below

5
On

It is unclear whether your observations are intended to be a random sample or a population. If a sample, then one ordinarily uses $n-1$ in the denominator of the sample variance. If a population, then it is discrete (taking only ten distinct values), so clearly not normal.

It is important to use the technical terminology precisely.

I have not checked your arithmetic; you should do so. I will assume for the moment that it is correct. So let's say this is a random sample from a distribution assumed to be normal.

In that case, 'fit' means to estimate the population mean $\mu$ by the sample mean (which I take to be) $\bar X = 471.8$ and to estimate the population standard deviation $\sigma$ by the sample standard deviation (which I take to be $S = 155.6.$

Then, the best fitting normal density curve is that of $\mathsf{Norm}(\mu = 471.8,\, \sigma =155.6).$

enter image description here


Note: When I tried to check your arithmetic for the sample mean (using R statistical software), I did not get the same result you did:

v = c(462, 480, 498, 516, 534, 552, 570, 588, 606, 624)
f = c( 98,  75,  56,  42,  30,  21,  15,  11,   6,   2)
sum(f)
## 356       # NOT 378
sum(f*v)/sum(f)
## 500.9831  # NOT 471.8
x = rep(v, times=f)
mean(x)
## 500.9831  # again
sd(x)
## 38.65557

Please check my transcription of your data and your computations to find the discrepancy.

Addendum to Note per Comments: A histogram using the default binning of R is shown below. From this histogram, I have doubts that the data are from a normal population. Maybe assignment was to 'test whether data fit normal' rather than 'find best fitting normal'. Can't really guess from what you have posted.

enter image description here