I am trying to determine the chance that categories (high, medium and low) in two different groups overlap at greater than or less than that expected by chance. I'll show one example below with my attempt to estimate this and I hope to find the right way to apply this to multiple examples.
Group A n = 235,998 and 98,530 (41.8%) are in the 'high' category.
Group B n = 141,969 and 71,305 (50.2%) are in the 'high' category.
There are 120,639 samples that belong to both Group A and Group B.
My calculation of predicted overlap is simply (41.8%)x(50.2%)x(120,639) = 25,297 (21.0%)
If the actual overlap that are 'high' in both categories is 5,000 (4.1%) then my conclusion is that the actual overlap is about five times less likely than chance. If the overlap was 50,000 (41%) then my conclusion is that the actual overlap is about twice as likely as chance.
Am I doing this wrong and is there a better way to do this?
Thank you so much!!