Which is the distribution of this data set

169 Views Asked by At

We have a test with possible scores from 0 to 100 and a sample of 20 subjects which have the score: 87,53,35,90,78,45,65,87,76,57,86,99,67,98,86,79,90,88,86,95. mean=77.35; standard deviation=17.55.

Is this distribution normal or not? Can we use normalized classes to differentiate between subjects?

Thank you!

1

There are 1 best solutions below

0
On

It is not possible to say for sure whether these test scores are from a normal population. There is a variety of ways to see if data are a 'reasonable fit' to normal.

Three graphical methods. Here are three common graphical methods:

(1) Make a Histogram of the data and judge see whether a sample seems consistent with the well-known normal 'bell curve' shape. One can plot the density curve of the 'best-fitting' normal density (in which we set $\mu = \bar X$ and $\sigma = S$).

(2) Because some information is lost when observations are sorted into 'bins' for a histogram, a more accurate method is to compare a plot of the empirical cumulative distribution function ECDF of a sample (in which, roughly speaking, each of $n$ observations is assigned probability $1/n$) with the best-fitting normal CDF. Then try to judge whether the points of the ECDF are a good fit to the CDF.

(3) The vertical scale of the ECDF plot can be transformed so that the normal CDF becomes a straight line. In such a plot one need only judge whether the data fit a straight line 'reasonably well'. The result in called a normal probability plot or a quantile-quantile plot of the data.

Each kind of plot is shown below. Perhaps owing mainly to the relatively large proportion of scores in the 80s, none of the three shows a good fit of your exam data to normal.

enter image description here

Numerical tests of normality. Furthermore, there are many formal numerical tests of normality. The null hypothesis is that the data are consistent with a random sample from a normal population and the alternative hypothesis is that they are not. The test statistics attempt to quantify the 'goodness of fit' of the ECDF to the CDF or of the normal probability plot to a straight line.

One formal test that is generally recognized as having good properties is the Shapiro-Wilk test. A printout of the result of this test for your test data, as implemented in R statistical software, is shown below. The low P-value indicates that the data do not appear to be consistent with a random sample from a normal population. (The Anderson-Darling test of normality from Minitab also has a P-value around 2%.)

 > shapiro.test(x)

         Shapiro-Wilk normality test

 data:  x 
 W = 0.8894, p-value = 0.02626

Makers of standardized tests (SAT, ACT, GRE and so on) expend considerable effort to ask questions that yield normally distributed scores across the population of people who take such tests. It is not surprising if results from one class exam in a university or high school class result in scores that are not consistent with normal.