Maximize grade in a multiple choice test using probability?

189 Views Asked by At

This is a probability problem.

Suppose you take a multiple-choice test consisting of $20$ questions whose answers are equally distributed (so, each letter corresponds to $4$ correct answers).

Each question has $5$ alternatives, only one of them is correct.

Further, in each question you're in equal doubt between the right and a wrong answer (so, the probability of correctly guessing it is $50\%),$ as follows:

enter image description here

Is there an optimal choice of answers such that the grade is maximized?

I don't really know the answer, but I believe it is affirmative, maybe using conditional probability.

Remark. I thought to approach the problem in several steps. In each step, we shall choose the best answer.

Step 1: In the first step, we shall compare the probability of getting each alternative right.

As to the letter A in this step, the probability of getting it right should be $\frac{4}{6}$ (?), since we know there are 4 right answers between the 6 we choose (we must improve this idea).

Here we should take into account the 50% hypothesis (?). Maybe the probability of getting A right should be smaller, since I can expect to get half of my choices right (?). Maybe the numbers are not correct, I just want to illustrate the idea.

Next, we compute the probability of getting B correct, then C, then D and then E. We shall choose the letter more likely to happen.

Step 2: In the next step we shall proceed as in the first one, but disregarding the letter we have chosen and disregarding the questions we have answered in the first step. This will change the probabilities. Again we choose the more likely answer.

We repeat the process till the last choice is made. In each step, a letter will be chosen and disregarded in the next step, so the procedure will end.

1

There are 1 best solutions below

5
On

This isn't a complete answer, but I believe what I found is still somewhat useful.

Edit: I realised that the definition of the expected value here may be ambiguous too.

Definition. A valid key answer is a key answer that satisfy the following requirements:

  1. Equal distribution of choices. If there are $n$ questions, then each of A, B, C, D, and E must be the correct answers for $n/5$ questions each.
  2. Follow the test taker's right-wrong distribution. For example, if the test taker thinks either A or C is the correct answer for question 1, then the valid key answer must have either A or C for question 1 too.

Definition. The expected score of a test taker's answer is the average score across all possible valid key answers.


To maximize the expected score, it is not sufficient to only consider the frequency of the individual choices.

Consider the following two distributions for five questions:

  1. AB, AC, AD, DE, CE
  2. AE, AE, AD, BC, CD

Both distributions have exactly $3$ A's, $1$ B's, $2$ C's, $2$ D's, and $2$ E's. However, for the first distribution, the maximum expected score is only $3$, while for the second distribution, the maximum expected score is $4$.

The following computer-generated tables show the expected scores for each of the $2^5$ answer choices for both distributions. The left table is for the first distribution, while the right table is for the second one. Notice that for both distributions, there are only two valid key answers (because we assume equal distribution). $$ \begin{array}{|c|c|c|c|} \hline &\text{BCADE}&\text{BADEC}&\text{Avg.}\\ \hline \text{BCDEE}&3&3&3.0\\ \text{ACDEE}&2&2&2.0\\ \text{BADEE}&2&4&3.0\\ \text{AADEE}&1&3&2.0\\ \text{BCAEE}&4&2&3.0\\ \text{ACAEE}&3&1&2.0\\ \text{BAAEE}&3&3&3.0\\ \text{AAAEE}&2&2&2.0\\ \text{BCDDE}&4&2&3.0\\ \text{ACDDE}&3&1&2.0\\ \text{BADDE}&3&3&3.0\\ \text{AADDE}&2&2&2.0\\ \text{BCADE}&5&1&3.0\\ \text{ACADE}&4&0&2.0\\ \text{BAADE}&4&2&3.0\\ \text{AAADE}&3&1&2.0\\ \text{BCDEC}&2&4&3.0\\ \text{ACDEC}&1&3&2.0\\ \text{BADEC}&1&5&3.0\\ \text{AADEC}&0&4&2.0\\ \text{BCAEC}&3&3&3.0\\ \text{ACAEC}&2&2&2.0\\ \text{BAAEC}&2&4&3.0\\ \text{AAAEC}&1&3&2.0\\ \text{BCDDC}&3&3&3.0\\ \text{ACDDC}&2&2&2.0\\ \text{BADDC}&2&4&3.0\\ \text{AADDC}&1&3&2.0\\ \text{BCADC}&4&2&3.0\\ \text{ACADC}&3&1&2.0\\ \text{BAADC}&3&3&3.0\\ \text{AAADC}&2&2&2.0\\ \hline \end{array} \quad \begin{array}{|c|c|c|c|} \hline &\text{AEDBC}&\text{EADBC}&\text{Avg.}\\ \hline \text{EEDCD}&2&2&2.0\\ \text{AEDCD}&3&1&2.0\\ \text{EADCD}&1&3&2.0\\ \text{AADCD}&2&2&2.0\\ \text{EEACD}&1&1&1.0\\ \text{AEACD}&2&0&1.0\\ \text{EAACD}&0&2&1.0\\ \text{AAACD}&1&1&1.0\\ \text{EEDBD}&3&3&3.0\\ \text{AEDBD}&4&2&3.0\\ \text{EADBD}&2&4&3.0\\ \text{AADBD}&3&3&3.0\\ \text{EEABD}&2&2&2.0\\ \text{AEABD}&3&1&2.0\\ \text{EAABD}&1&3&2.0\\ \text{AAABD}&2&2&2.0\\ \text{EEDCC}&3&3&3.0\\ \text{AEDCC}&4&2&3.0\\ \text{EADCC}&2&4&3.0\\ \text{AADCC}&3&3&3.0\\ \text{EEACC}&2&2&2.0\\ \text{AEACC}&3&1&2.0\\ \text{EAACC}&1&3&2.0\\ \text{AAACC}&2&2&2.0\\ \text{EEDBC}&4&4&4.0\\ \text{AEDBC}&5&3&4.0\\ \text{EADBC}&3&5&4.0\\ \text{AADBC}&4&4&4.0\\ \text{EEABC}&3&3&3.0\\ \text{AEABC}&4&2&3.0\\ \text{EAABC}&2&4&3.0\\ \text{AAABC}&3&3&3.0\\ \hline \end{array} $$


I have also made a Python 3 script, but this script can perhaps be improved to be much more efficient. Running the script for the original distribution is time consuming; there are $5922$ possible key answers, and $2^{20} = 1\:048\:576$ possible choice of answers. Note that question 14 only has C as the single possible answer in the table, so I added B as the second option for this question.

My computer would take around 1.5 hours to fully run the script, but so far, it manages to find an answer with expected score around $11.5022$: ADBADDBEAEADDBADEABE. Interestingly, it does not have any C's.

Edit: The script finds no better solution.

def compute_score(answer, key):
    score = 0
    number_of_questions = len(answer)
    for i in range(number_of_questions):
        if answer[i] == key[i]:
            score += 1
    return score

def generate_possible_keys_and_answers(distribution):
    number_of_questions = len(distribution)
    
    possible_keys = []
    possible_answers = []
    
    # uses bit masking
    for mask in range(2**number_of_questions):
        choices = []
        for question in range(number_of_questions):
            if mask & (1 << question):
                choices.append(distribution[question][0])
            else:
                choices.append(distribution[question][1])
        
        possible_answers.append(choices)
        if all(choices.count(choice) == number_of_questions // 5 for choice in ("A", "B", "C", "D", "E")): 
            possible_keys.append(choices)
            
    return possible_keys, possible_answers

def find_best_answer(possible_keys, possible_answers):
    best_total_score = 0
    best_answer = None
    for answer in possible_answers:
        total_score = 0
        for key in possible_keys:
            total_score += compute_score(answer, key)
        if best_total_score < total_score:
            best_total_score = total_score
            best_answer = answer
            print("Current best expected score:", best_total_score / len(possible_keys))
            print("Current best answer:", best_answer)
    return best_total_score / len(possible_keys), best_answer

def main():
    input_string = input("Write the answer distribution. \n"
                         "Provide two possible answers for each question. \n"
                         "Separate questions with a comma. \n"
                         "Ensure that the answers are either A, B, C, D, or E, \n"
                         "and the number of questions is a multiple of 5.\n")
    distribution = tuple((question[0], question[1]) for question in input_string.split(','))
    assert len(distribution) % 5 == 0
    possible_keys, possible_answers = generate_possible_keys_and_answers(distribution)
    print("Number of possible keys:", len(possible_keys))
    print("Number of possible answers:", len(possible_answers))
    best_expected_score, best_answer = find_best_answer(possible_keys, possible_answers)
    print("Best expected score:", best_expected_score)
    print("Best answer:", best_answer)

main()