Comparing two box plots

459 Views Asked by At

Say, these are two boxplots for ratings for a movie (1 to 10) say Star Wars The Phantom Menace. The first boxplot are ratings received by various experts for the same movie The second boxplot are ratings received by the audience for the same movie again.

First and Second Boxplot

I see that the First Range: 7-1=6 Second Range: 10-0=10

First max: 7 Second max: 10

Comparing the two box plots above, Is it fair to say that that there's more variation in the Second boxplot data?

Also, Is it fair to say that on an average the rating giving by experts is higher than that of the audience?

2

There are 2 best solutions below

0
On

I think you can say that the variation is higher in the second boxplot data. However, I don't think you can say that about the average rating since the second boxplot data has so much more variation, and the first boxplot's data has its entire range fall within the second box plots.

Is this data that you have on hand? If so, you may want to adjust your whiskers for the box plots so that the upper/lower fences fall +/- 1.5*IQR, and mark the remaining data points as outliers. Then we will have a better grasp about the distribution of the data. Otherwise, hard to say.

0
On

Your charts are suggestive that the variation may be greater in the second. But this need not be the case. Suppose the mark distribution was as follows, broadly consistent with the minimum - 1st quartile - median - 3rd quartile - maximum values in the box plots:

First:          Second:
Mark Number     Mark Number
 1    12         0      1
 4     1         5   2499
 5    13         6   1249
 6     1         7   1250
 7    25         10     1

Then the first sets of marks has variance of about $5.65$ while the second has a variance of about $0.698$, i.e. the variation is much smaller in the second, contrary to the impression given by the boxplots. Other distributions could produce different variances.

That particular example has the mean of the second distribution higher than the mean of the first, but again it would be easy enough to to reverse this by changing a few of the numbers (e.g. in the first distribution if mark $1$ was allocated by $1$ and mark $4$ by $12$, or in the second distribution if mark $0$ was allocated by $1249$ and mark $5$ by $1251$). The boxplots do not suggest to me which should have the higher mean.