Hua Luogeng (in Chinese, 华罗庚) took a hat-guessing puzzle as an illustration in a booklet focusing on mathematical induction. The following description is a literal translation from Chinese.
Hat-Guessing Puzzle: A teacher wants to identify the most smartest one from his three students by the following methods: 5 hats are shown, in which 3 of them are white and the other two are black, to the three students. With eyes closed, each student is put on a white hat, while the other two black ones are hidden. Once the permissions of opening eyes are granted, the three students open their eyes simultaneously and are allowed to look at the hats of others without any communication. After a hesitation, they say that "the hat on my own hat is white" with one voice. The question is how do they make it?
The puzzle can be easily generalized to the version with $n$ persons, $n-1$ black hats, and $\ge n$ white hats and be solved by induction on $n$:
Prove by induction on $n$:
- When $n$ is 2, the situation is trivial because there is only one black hat. If I am wearing the black one, then the other person can tell that the hat on his head is white, without any hesitation. However, he does hesitate. Contradiction.
- Suppose that we have solved the puzzle with $n = k$.
- When $n = k + 1$, the reason goes in the following way: If someone is wearing the black hat, all the other people will know that and the problem is reduced to a version with $k$ person, $k-1$ black hats and $\ge k+1$ white hats. According to the inductive hypothesis, the $k$ persons should tell that the hats on their heads are all white, without any hesitation. Contradiction.
While both the puzzle and proof sound very simple, I am very confused with the informal keyword "hesitation" in them. I even cannot tell the vagueness of hesitation clearly. For example, is the hesitation itself a suitable object which can be used in mathematical induction? Informally, my problem can be stated as follows.
My Problem: How to formally model the hesitation in the hat-guessing puzzle?
Thank you for any suggestions.
The puzzle is older than its mathematical formalizations and a version of it dates back to the fifties. The most common way to model such a situation is by using the partitional model introduced by Robert Aumann. There is a finite set (can be somewhat generalized) $\Omega$ of states. A state describes everything relevant that could be the case. In the puzzle, a state could describe who wears what hat, so there are seven states $$\Omega=\{www,wwb,wbw,bww,bbw,bwb,wbb\},$$ where for example $wbw$ stands for kid $1$ wearing a white hat, kid $2$ wearing a black hat, and kid $3$ wearing a white hat. Now we have to model what the kids know. We do this the following way: A person is able to differentiate between certain states, but not others. We model this by that person having a partition of $\Omega$ and the person gets informed in which element in the partition the true state lies but not the precise state. In the puzzle, a kid is unable to differentiate states in which everyone else wears the same hat. For example, kid $2$ has the partition $$P_2=\big\{\{www,wbw\}, \{bww,bbw\},\{wwb,wbb\},\{bwb\}\big\}.$$
Note that kid $2$ knows the color of her hat only if the true state is $bwb$. Kid $1$ knows it if the true state is $wbb$ and kid $3$ know it if it is $bbw$. Generally, let $P_i$ be the information partition of a person $i$. If the true state is $\omega$, we let $P_i(\omega)$ be the element of the partition that contains $\omega$ and we interpret it as the set of states $i$ deems possible. An event is simply a set of states. We say that $i$ knows the event $E$ at state $\omega$, if $P_i(\omega)\subseteq E$. We let $K_i(E)$ be the set of states at which $i$ knows $E$. Note that $K_i(E)$ is an event itself. So, one can write things like $K_1(K_2(K_1(E)))$, which can be interpreted as $1$ knowing that $2$ knows that $1$ knows $E$.
Now in the puzzle, the true state is $www$. But a kid seeing two white hats does not tell her anything about the color of her own hat. Nobody knows the color of her hat, for otherwise she would say it. The event that nobody knows the color of her hat is $$E=\{www,wwb,wbw,bww\},$$ which is exactly the set of states at which no two kids wear a black hat. Everyone knows this and gets therefore a new partition in which this knowledge is incorporated. For example, $$P_2'=\big\{\{www,wbw\}, \{bww\},\{bbw\},\{wwb\},\{wbb\},\{bwb\}\big\}.$$ Formally, this is the coarsest partition that is at least as fine as the two partitions $P_2$ and $\{E,E^C\}$ and this is how the model is learning. Even with this partition, no kid knows the color of her hat. Everyone knows this, and this allows her to deduce that the state is not on in which the other two kids have white heads and she has black hats, for the other kids could then deduce that they have white hats and they did not. From this, every kid can deduce that the true state is $www$, or more formally $P''_i(www)=\{www\}$ for $i=1,2,3$, so this becomes a formal statement.
A survey of this kind of modeling has bee written by John Geanakoplos for the Handbook of Game Theory and Economic Applcations. The survey can be found here. It also discusses essentially the same puzzle. The article is somewhat technical, and a slightly simplified version can be found here.