$Q$ is a set of yes or no questions. I know the answer to $q_{know}$ of these questions ($0\leq q_{know}\leq|Q|)$, but I have to guess for the remaining ones, with a $0.5$ probability of guessing correctly. A test $T$ is generated by randomly selecting $n$ of these questions
($|T|=n$). To pass the test, I can make $k$ mistakes at the most, with $k>0$.
- What is the probability of passing the test, as a function of these parameters?
- Assuming $n/k%$ is constant (e.g., for every ten questions in the test, 1 mistake is allowed) is it better to take the test with more or less questions?
This question came up during a discussion and I thought that the number of questions doesn't matter because the probability is always the same, but I'm starting to think that it might be more complicated than this. Should it be a product of binomial distributions?
As we randomly select $n$ question from $|Q|$ questions without replacement and $q_{know}$ of the questions have the "feature" that you know the answer to them, the number of questions that you know the answer to comes from the hypergeometric distribution. If $X$ denotes the random variable corresponding to the number of questions you know the answer to, the probability that you will know the answer to exactly $m$ questions is equal to: $$P(X = m) = \frac{\binom{q_{know}}{m}\binom{|Q|-q_{know}}{n-m}}{\binom{|Q|}{n}}$$ Assuming you have correctly answered to $m$ question, the number of correct guesses in the remaining $n-m$ questions follow the binomial distribution with the parameter $p=\frac{1}{2}$. Let $Y$ denote the number of correctly guessed questions. Then $$Y \sim \mathcal{B}(n-X, \frac{1}{2}) \rightarrow P(Y=t)=\binom{n-X}{t}2^{X-n}$$ As $Y$ is dependent on $X$, we need to sum conditional probabilities for each of the possible values of $X$, and we know that $0 \leq X \leq q_{know}$. So, the probability that you will correctly answer exactly $n-k$ questions is equal: $$\mathcal{P}(n-k) = \sum_{i=0}^{q_{know}}P(X=i) P(Y = n - k - i | X = i)= \sum_{i=0}^{q_{know}}\frac{\binom{q_{know}}{i}\binom{|Q|-q_{know}}{n-i}}{\binom{|Q|}{n}}\binom{n-i}{n-k-i}2^{i-n}$$
I calculated the above using WolframAlpha and it gave the result in terms of a hypergeometric function. $$\mathcal{P(n-k)}=\frac{\binom{n}{n-k}\binom{|Q|-q_{know}}{n}}{2^{n}\binom{|Q|}{n}}{}_2F_1(k-n, -q_{know}; -n-q_{know}+|Q|+1;2)$$
This of course corresponds to the scenario in which you get exactly $k$ mistakes. To get the final answer, we need to calculate over all viable numbers of mistakes $0 \leq j \leq k$. So, the probability that you pass the test is equal to $P_{passed}$: $$P_{passed} = \frac{\binom{|Q|-q_{know}}{n}}{2^{n}\binom{|Q|}{n}}\sum_{j=0}^k \binom{n}{n-j}{}_2F_1(j-n, -q_{know}; -n-q_{know}+|Q|+1;2)$$
However, I don't find myself capable of calculating what happens, when $\frac{k}{n}$ is constant.