So I was thinking about how to model exams. I assumed that teacher will give out $N$ problems to study. Due to the nature of exams, one can't have all $N$ problems on the actual exam, let's say teacher chooses m ($m \leq N$). Let's also assume that student can't study all $N$ problems and they only get through $r$ problems. What are possible overlaps and how are they distributed? Do you know of any problem similar to this that I can read up on (I couldn't come up with good phrasing to google this).
I went through a simpler version of this with $N=3, m=2, r=2$.
If lecture chooses $1$ and $2$ for exam. A student might study
- $1$ and $2$ ($100\%$ overlap/score),
- $2$ and $3$ ($50\%$ overlap/score),
- $3$ and $1$ ($50\%$ overlap/score).
In most cases student gets $50\%$ of score, even though their body of knowledge is $66\%$. I am trying to write out combinatorial equations, to understand this, but I wonder if this problem has been shown somewhere.
The probability distribution of the number of overlaps is the hypergeometric distribution, with probability distribution given by $$ P(\text{# overlaps $=k$})=\frac{\binom{r}{k}\binom{N-r}{M-k}}{\binom{N}{m}}. $$ You mention fairness, and in your small example, you mention that even though the student studied $66.\overline{6}\%$ of the problems, most of the time they will know only $50\%$ of the questions on the test. Still, the average percent of quetions they will know is $$ \frac13\cdot 100\% + \frac13 \cdot 50\%+\frac13 \cdot 50\%=66.\overline{6}\%, $$ so the average proportion of questions on the test they will have studied is equal to the proportion of the total number of questions they studied. This is true in general; the expected propotion of questions the student will know is $r/N$.