For an algorithm I want to calculate probability and percentage of repetition of questions in an exam.
For example: suppose there are 1000 questions in an question bank and 100 candidates are going to appear that exam and according to paper config each candidate will get random 2 questions from that bank.
I want to calculate how many approx. questions may get repeat by considering no. of candidates and questions need to pull for each candidate.
I checked Birthday problem, which is considering total questions viz 1000 and randomly picked question viz 2 but what about total candidates? How can I calculate this?
Let there be $m$ questions, $n$ candidates and let it be that every candidate receives $k$ questions. Define $p:=\frac{k}{m}$ and $q=1-p$.
For every $i\in\{1,\dots,m\}$ let $Q_i$ be a random variable taking value $1$ if question $i$ is repeated and taking value $0$ otherwise.
Then $\sum_{i=1}^mQ_i$ is the number of questions that will be repeated and by linearity of expectation and symmetry we find:$$\mathbb EQ=\sum_{i=1}^n\mathbb EQ_i=m\mathbb EQ_i=mP(Q_1=1)$$
Now it remains to find $P(Q_1=1)$.
For that let $X$ denote the number of candidates that will get question $1$. Then $X$ has binomial distribution with parameters $n$ and $p$.
Then: $$P(Q_1=1)=1-P(X=0)-P(X=1)=1-q^n-npq^{n-1}$$
Proved is now: $$\mathbb EQ=m(1-q^n-npq^{n-1})$$