Is it easier to pass a test with more questions and more mistakes allowed, or less questions but less mistakes allowed?

1k Views Asked by At

$Q$ is a set of yes or no questions. I know the answer to $q_{know}$ of these questions ($0\leq q_{know}\leq|Q|)$, but I have to guess for the remaining ones, with a $0.5$ probability of guessing correctly. A test $T$ is generated by randomly selecting $n$ of these questions
($|T|=n$). To pass the test, I can make $k$ mistakes at the most, with $k>0$.

  1. What is the probability of passing the test, as a function of these parameters?
  2. Assuming $n/k%$ is constant (e.g., for every ten questions in the test, 1 mistake is allowed) is it better to take the test with more or less questions?

This question came up during a discussion and I thought that the number of questions doesn't matter because the probability is always the same, but I'm starting to think that it might be more complicated than this. Should it be a product of binomial distributions?

2

There are 2 best solutions below

0
On BEST ANSWER

As we randomly select $n$ question from $|Q|$ questions without replacement and $q_{know}$ of the questions have the "feature" that you know the answer to them, the number of questions that you know the answer to comes from the hypergeometric distribution. If $X$ denotes the random variable corresponding to the number of questions you know the answer to, the probability that you will know the answer to exactly $m$ questions is equal to: $$P(X = m) = \frac{\binom{q_{know}}{m}\binom{|Q|-q_{know}}{n-m}}{\binom{|Q|}{n}}$$ Assuming you have correctly answered to $m$ question, the number of correct guesses in the remaining $n-m$ questions follow the binomial distribution with the parameter $p=\frac{1}{2}$. Let $Y$ denote the number of correctly guessed questions. Then $$Y \sim \mathcal{B}(n-X, \frac{1}{2}) \rightarrow P(Y=t)=\binom{n-X}{t}2^{X-n}$$ As $Y$ is dependent on $X$, we need to sum conditional probabilities for each of the possible values of $X$, and we know that $0 \leq X \leq q_{know}$. So, the probability that you will correctly answer exactly $n-k$ questions is equal: $$\mathcal{P}(n-k) = \sum_{i=0}^{q_{know}}P(X=i) P(Y = n - k - i | X = i)= \sum_{i=0}^{q_{know}}\frac{\binom{q_{know}}{i}\binom{|Q|-q_{know}}{n-i}}{\binom{|Q|}{n}}\binom{n-i}{n-k-i}2^{i-n}$$

I calculated the above using WolframAlpha and it gave the result in terms of a hypergeometric function. $$\mathcal{P(n-k)}=\frac{\binom{n}{n-k}\binom{|Q|-q_{know}}{n}}{2^{n}\binom{|Q|}{n}}{}_2F_1(k-n, -q_{know}; -n-q_{know}+|Q|+1;2)$$

This of course corresponds to the scenario in which you get exactly $k$ mistakes. To get the final answer, we need to calculate over all viable numbers of mistakes $0 \leq j \leq k$. So, the probability that you pass the test is equal to $P_{passed}$: $$P_{passed} = \frac{\binom{|Q|-q_{know}}{n}}{2^{n}\binom{|Q|}{n}}\sum_{j=0}^k \binom{n}{n-j}{}_2F_1(j-n, -q_{know}; -n-q_{know}+|Q|+1;2)$$

However, I don't find myself capable of calculating what happens, when $\frac{k}{n}$ is constant.

0
On

Consider flipping a fair coin 10 times vs 1000 times. Is it more likely to get 7 or more heads in the first case, or 700 or more heads in the second case? I think you can intuit that the former is more likely than the latter, and you'd be exactly right. This is the Law of Big Numbers: the bigger your sample size, the closer you can expect the observed frequency to get to the probability. As you yourself started to suspect, this is indeed all about the distribution: the distribution curve gets more narrow the greater the sample size is. So, with a probability of 0.5, you can expect to get the number of heads with 1000 flips to be closer to 0.5 than if you just flip 10 times.

So: if the probability of you guessing the answer an individual question correctly is greater than the percentage of question you need to get correctly, then you should go for the test with more questions. If the probability of guessing correctly is smaller than the needed percentage, then you should go for the smaller test.