Most accurate mean for grouped/ranged data

721 Views Asked by At

I've got data in a range and frequency format e.g 0-5 : 3, 6-10 : 70 etc up to 31-35. How do I get the most accurate mean for such data?

1

There are 1 best solutions below

1
On

There is no way to get the most accurate mean since some information has been lost in the grouping.

E.g. your grouped data could represent 3 data items with value 0, and 70 items with value 6, and no other items. Then the mean would be $$\frac{3 \times 0 + 70 \times 6}{3 + 70} \approx 5.75.$$

But your grouped data could just as possibly represent 3 data items with value 5, and 70 items with value 10, and no other items. Then the mean would be $$\frac{3 \times 5 + 70 \times 10}{3 + 70} \approx 9.79.$$

So you have to take some view on how the groupings were made. It looks like your data can take whole numbered values 0, 1, 2, ... If you say to yourself that you know nothing about how the groups were made, so that an item in the 0-5 group is equally likely to be any of the six values 0, 1, 2, 3, 4, 5, then its (average) expected value is $$\frac{1}{6} \times \left( 0 + 1 + 2 + 3 + 4 + 5 \right) = 2.5.$$

Similarly the expected value of the (5 possible values in the) 6-10 group is $$\frac{1}{5} \times \left( 6 + 7 + 8 + 9 + 10 \right) = 8.$$

So with this (uniform) assumption about how the groups were made, and assuming there are no values above 10, the mean would be $$\frac{3 \times 2.5 + 70 \times 8}{3 + 70} \approx 7.77.$$