How to prove existence using expected value

252 Views Asked by At

In Joe Blitzstein's Stat 110 course at Harvard, the 8th strategic practice problem set has a section on existence. The second question in this section and the provided solution are as follows:

  1. A hundred students have taken an exam consisting of 8 problems, and for each problem at least 65 of the students got the right answer. Show that there exist two students who collectively got everything right, in the sense that for each problem, at least one of the two got it right.

Solution: Say that the “score” of a pair of students is how many problems at least one of them got right. The expected score of a random pair of students (with all pairs equally likely) is at least 8(1 - 0.35^2)=7.02, as seen by creating an indicator r.v. for each problem for the event that at least one student in the pair got it right. (We can also improve the 0.35^2 to 35/100 · 34/99 since the students are sampled without replacement.) So some pair of students must have gotten a score of at least 7.02, which means that they got a score of 8! ( not a factorial.)

The structure of the argument seems to be that since the expected value of the score of a pair of students is ~7.02, at least one pair of students must have gotten at least this score. And since there are an integer number of questions, at least one pair of students must have gotten the score 8/8.

But how can we be sure that the expected value of the score isn't higher than the score of all pairs of students in a particular group of 100 students? I understand that 65% of students in this particular group got each question correct, so we could use similar reasoning to the above to prove e.g. that there exists a student who got at least half of the questions correct. But how can we be sure that the operation carried out in computing the score of a pair of students doesn't distort matters?

What if we defined the score of a pair of students to be:

  • 1! (factorial) if they both got the same result in one question
  • 2!! if they both got the same result in two questions
  • .
  • .
  • .
  • 8!!!!!!!! if they both got the same result in all 8 questions

so e.g. if one student has the following results in the 8 questions: 10010111 and a second student has the following results: 01110110 then that pair would get the score 4!!!! since they got the same results in 4 questions.

Clearly if we computed the expected value of this score based on, say, 5% of students getting correct answers to each question, we would get a score larger than any pair is likely to have, since the expected value would be stretched by the huge 8!!!!!!!!, even though the probability of two students getting the exact same results is fairly low.

So it seems that we cannot necessarily use expected value to prove existence. Can someone explain why it makes sense in the question in the problem set but not in the case of the similarity score? Is there a general way to clearly see when an expected value can be used to prove existence and when it can't?

1

There are 1 best solutions below

0
On

In the original problem, for a given pair of students, the expected score on the exam is the sum of the expected scores on the problems. That's why you can take (1-.35^2) (which is approximately the expected score on one problem) and then just multiply by 8 to get the expected score on the exam.

In your alternative problem it's no longer true that the expected score on the exam (for a pair of students) is the sum of the expected scores on the problems. So you can't just mimic that step in the reasoning.

If you wanted to construct a different example where the original reasoning works, you could imagine an exam in which different problems have different weights, but the score for the exam is still the sum of the scores for the individual problems. Then (if you do everything carefully) you'll get a new problem that you can solve by the old method.

Once again, the key assumption is "score on exam equals sum of scores on problems".