If 140 girls (and let say out of them, 40 were blond) applied for 10 secretary jobs, and out of 10 admitted 7 were blond, can we claim that recruiter was biased towards blonds? Could we use chi square independence test to test it?
tblBlond = data.frame(row.names=c('Blond','notBlond'), Job=c(7,3), noJob=c (33,97), stringsAsFactors = FALSE)
setDT(tblBlond)
chisq.test(tblBlond)
and I got as output:
Pearson's Chi-squared test with Yates' continuity correction
data: tblBlond X-squared = 7.0027, df = 1, p-value = 0.008139
Warning message: In chisq.test(tblBlond) : Chi-squared approximation may be incorrect
What would be the most appropriate interpretation of test results?
Are the results of the test relevant since we do not have more than 5 obs in every cell? (since chi-sq <- normal <- binomial approximation).
Is this test in fact testing whether two variables (blonde and job) belong to binomial distributions with the same p (success probability)?
What else could be a problem while using chi square independence test?
I am generally against saying that we could "prove" that the recruiter was biased, as "prove" seems to strong of a word. However, a $\chi^2$ test of independence would indeed provide significant evidence of it.
Your null hypothesis would be the following: Hair color and job acceptance are independent. As for how to perform the $\chi^2$ test, here's a very good resource.
Once you perform the test, you can receive a $p$-value, which tells you the chance that, given your null hypothesis, this outcome would occur. If the $p$-value is sufficiently low, you conclude it is likely that the two are not independent, and hence the recruiter was biased.