Goodness-of-fit test for nominal data?

116 Views Asked by At

I have asked 40 people to choose their favourite colour from a possible 4 (Red, Blue, Green & White), and their favourite football team from a possible 4 (Liverpool, Chelsea, Plymouth Argyle and Leeds), to see if there is a correlation between the two.

How can I test for correlation given my data is nominal?

1

There are 1 best solutions below

5
On BEST ANSWER

I think you are using 'correlation' in a colloquial, rather than a technical sense. Here is a test of hypothesis based on your Comment, with a bit of rationalization for it.

By random chance, the probability of agreement would be 1/3. So the null distribution is $X \sim Binom(40, 1/3),$ where $X$ counts agreements. If you have $X = 30$ agreements, then that's more than the 'expected' $E(X) = np = 40(1/3) = 13.33.$

The P-value of a test would be $P(X \ge 30) \approx 0.$ If colors choices were at random without regard to team favoritism, it would be almost impossible to get 30 'agreements'. Probability computations below are from R statistical software:

 1 - pbinom(29, 40, 1/3)             # 'pbinom' is binomial CDF
 ## 8.474878e-08
 x = 30:40;  sum(dbinom(x, 40, 1/3)) # 'dbinom' is binomial PDF
 ## 8.474878e-08

And from Minitab 17:

 Test and CI for One Proportion 

 Test of p = 0.3333 vs p > 0.3333

                                              Exact
 Sample   X   N  Sample p  95% Lower Bound  P-Value  
 1       30  40  0.750000         0.612940    0.000

The interpretation of the one-sided 95% CI is that you have 95% 'confidence' that the true proportion $p$ of 'agreements' exceeds $0.61.$

Note: One might use a normal approximation to get the P-value. Since it turns out to be so small, an approximation would be OK. But in general, for $n$ as small as 40, I think exact computation is better.

Addendum: (response to Comment) Here are computations in R for your proposed chi-squared test. You have 2 cells, hence df = 2-1 = 1. The chi-squared goodness-of-fit statistic is $Q = \sum_{i=1}^2 (X-E)^2/E,$ where X's are observed counts and E's are expected counts. $Q \sim Chisq(1),$ provided $E_i > 5.$ The computed value is $Q = 31.25;$ the 5% critical value is $q^* = 3.8415$. Reject 'fit' because $Q > q^*;$ P-value of this approximate test is essentially $0.$

Obs = c(30, 10); prob = c(1/3, 2/3)
n = 40;  Exp = n*prob
Q = sum((Obs-Exp)^2/Exp);  Q
## 31.25
qchisq(.95, 1)     # critical value
## 3.841459
1 - pchisq(Q, 1)   # aprx P-value
## 2.268475e-08