Class width vs Class size vs Class interval

28k Views Asked by At

Please enlighten me. I'm confused with the lesson. We are now on statistics (grouped data) and I'm confused with Class width, Class size and Class interval. Can you differentiate the 3 in simple words? By the way, the $i$ stands for which of the 3? Thanks and good day

2

There are 2 best solutions below

0
On

Here are 50 test scores, sorted from smallest to largest. The numbers in brackets show the index of the first observation in each row (ten observations per row).

 [1]  69  71  76  79  79  80  81  82  82  83
[11]  85  86  88  88  89  89  90  90  92  92
[21]  93  93  98  99  99 100 100 100 101 102
[31] 103 104 105 105 105 105 106 106 107 107
[41] 107 108 109 115 116 118 119 119 123 124

Here is a histogram of these data, with labels atop each of the seven bars, showing the size (number of observations) of each Class interval. The modal interval (the one with the largest count) is $(100, 110].$ The size or frequency of this interval is $15$. For this histogram, the sizes of the intervals (heights of bars) are shown on the vertical scale. Accordingly, this is called a "frequency histogram." (The sum of the frequencies must add to the size, here $n = 50,$ of the sample.)

In this histogram, small tick marks at the bottom show the exact positions of these 15 scores $101, 102, \dots, 109, 109.$ (There aren't 15 distinct tick marks because some marks represent several tied observations.)

enter image description here

The width of each interval is 10 points: ('61 up through 70', '71 up through 80', and so on.) The usual practice is to use intervals of equal width, unless there is a very good reason not to. Notice that we have to decide whether the 'round' numbers 70, 80, 90, and so on, are the smallest or largest values in an interval.

  • The software that made this histogram puts the round numbers at the high end of each interval. It would mess up the counting if we didn't make a clear distinction whether an endpoint in included (here denoted by "$\,]$", bracket) or excluded (here denoted by "$(\,$ " prenthesis).

  • Another method would be to label the endpoints '60.5 to 70.5', '70.5 to 80.5', and so on, because we have integer data which means that no score could ever fall at an interval endpoint. But this gives slightly messier labels along the score (bottom) axis.

I don't have your book at hand, so I can't be sure what $i$ stands for. My guess is that this may be a way of numbering the class intervals: $i = 1$ for $(60,70]$; $i = 2$ for $(70, 80]$, and so on to $i = 7$ for $(120,130].$ But sometimes $i$ is used to number the observations. In that case you would have $i = 1, 2, \dots, 50$ for the observations.

0
On

The class width is the difference between class boundaries (may or may not be the same as class limits). For example, $$10-19 \ \ \ \ 3 \\ 20-29 \ \ \ \ 7 \\ 30-39 \ \ \ \ 2$$ The second class limits are $20$ and $29$, while class boundaries are $19.5$ and $29.5$. Hence, the second class width is $29.5-19.5=10$.