Does a Bar Chart Show Statistical Significance?

454 Views Asked by At

Given a picture bar chart with no other data, means or standard deviation. If one bar is higher than the rest can you prove that it is statistically significant?

What kind of test would you perform?

1

There are 1 best solutions below

0
On BEST ANSWER

Bar charts that give only proportions or percentages for various categories are of almost no value in deciding whether categories are equally likely. One must know the counts in each category (or be able to reconstruct counts from the total sample size) in order to test whether categories are equally likely.

'Error bars' on the bars might sometimes be useful, but there seems to be no standard way to make them--much less consistently useful information how to interpret error bars. Nothing is better than for a bar plot to show counts along with total sample size. Then rely on a formal statistical test to assess equality of categories.

Examples: Suppose I am rolling a die to see if it is fair. After some exploratory rolls I get percentages for each face as follows, and make a bar chart showing the counts.

Faces:  1  2  3  4  5  6 
Pct:   12 14 16 12 26 20 

The bar for 5's stands out as being higher than the rest.

enter image description here

However, this is not necessarily 'proof' of an unfair die. Suppose, the percentage results are based on counts below for only $n=50$ rolls of the die.

 Faces:   1  2  3  4  5  6 
 Counts:  6  7  8  6 13 10 

A test for equal proportions in R shows P-value $0.37 > 0.05 = 5\%,$ which is far from a statistically significant result. It would be wrong to declare the die unfair.

prop.test(c(6,7,8,6,13,10), rep(50, 6))

    6-sample test for equality of proportions 
    without continuity correction

data:  c(6, 7, 8, 6, 13, 10) out of rep(50, 6)
X-squared = 5.376, df = 5, p-value = 0.3717
alternative hypothesis: two.sided
sample estimates:
prop 1 prop 2 prop 3 prop 4 prop 5 prop 6 
  0.12   0.14   0.16   0.12   0.26   0.20 

By contrast, suppose I roll the die $n=600$ times and get the following counts.

Faces:    1   2   3   4   5   6
counts:  66  84  91 112 103 144

Percentages rounded to one decimal place are as shown below,

Faces:     1    2    3    4    6    6
Pct:    11.0 14.0 15.2 18.7 17.2 24.0

A bar chart of frequencies may look somewhat similar to the one above in its 'irregularities'.

enter image description here

However, with 600 rolls we have a lot more information. For these counts, prop.test gives a P-value near $0,$ indicating that it would be almost impossible for a truly fair die to give such counts.

prop.test(tabulate(y),rep(600,6))

    6-sample test for equality of proportions 
    without continuity correction

data:  tabulate(y) out of rep(600, 6)
X-squared = 42.984, df = 5, p-value = 3.723e-08
alternative hypothesis: two.sided
sample estimates:
   prop 1    prop 2    prop 3    prop 4    prop 5    prop 6 
0.1100000 0.1400000 0.1516667 0.1866667 0.1716667 0.2400000 

Note: Here is the R code for the 50 rolls of a fair die and the 600 rolls of an unfair die used as examples above:

set.seed(507)
x = sample(1:6, 50, rep=T)

set.seed(2021)
y = sample(1:6, 600, rep=T, p = c(2,3,3,3,3,4))