I have 2 high schools, School A and School B. For the first school, I have 5 classes of students; for the second, I have 3 classes (so 8 classes in total). Within each class, I have different categorical information about each student, for example whether they're male, whether they study French, etc.
The number of students in each class is different.
So the data might look like this (for example):
SCHOOL A
- Class 1: 50 students, 20 males, and 5 students who study French
- Class 2: 300 students, 50 males, and 8 students who study French
- ...
- Class 5: 25 students, 17 males, and 3 students who study French
SCHOOL B
- Class 1: 140 students, 80 males, 10 students who study French
- Class 2: 2500 students, 600 males, 110 students who study French
- Class 3: 200 students, 110 males, 9 students who study French
What test to I do to assess whether there is a significant difference in the number of males or students who study French between School A and School B?
I'm confused because the different sample sizes presumably mean we should be looking at proportions, but if I look ONLY at proportions, am I still factoring in the magnitudes of the original values? (e.g. far more students are males than study French, so 6/100 students studying French v.s. 3/100 will look small in terms of proportion changes) Would it be a t-test on the proportions?
Since every variable is categorical (school, gender, topic studied), you can run a chi-squared test for independence. You want to compare proportions while factoring in the sample sizes, this is exactly what chi-squared tests do.
Simply compute what the expected numbers are in case of mutual independence of all variables, and compute the statistic $$\chi^2=\sum \frac{(O-E)^2}{E}$$ where $O$ stands for Observed numbers and $E$ stands for Expected. The degree of freedom of the system is the product "number of schools $-1$" times "number of topics $-1$" times "number of genders $-1$". Given the way the question is asked, the classes are irrelevant.