I can't understand the solution I was given to this problem:
12 coffee cans are filled with 12 ounces of coffee. 1 of the cans has been filled in error and has either more or less than 12 ounces. Using a balance, where 12 cans can fit on either side of the balance. We want to use the balance to identify which can has the error. What is the minimum number of measurements required to find the can in error?
The solution I was given states: Since the chance for $12$ cans to be heavier or lighter is equally likely, there are $24$ cases with probability $\frac{1}{24}$, and the expected information of the problem is $-\sum_{i=1}^{24} 1/24 \cdot log_2(1/24) = 3 + log_2(3)$ bits.
There are $3$ possibilities for measurements, both sides of balance are equal, left side of balance heavier, or right side is heavier. The information gain on a measurement should be $log_2(3)$. $n log_2(3) > 3 + log_2(3) => n > \frac{3}{log_2(3)} + 1 \rightarrow n=3$ is the minimum number of measurements.
I don't understand this solution at all. I understand $P(can = error) = 1/12$, and $P(can = Lighter | can = error) = 1/12 * 1/2 = 1/24$. But I don't follow why there are "24 cases." I see that each can has $3$ states, light, heavy, or normal. So shouldn't the entropy be $-(\frac{1}{24} \cdot log_2(\frac{1}{24}) + \frac{1}{24} \cdot log_2(\frac{1}{24}) + \frac{11}{12} \cdot log_2(\frac{11}{12})) = 0.4971502$?
This problem comes from the book: Information-Statistical Data Mining (Sy & Gupta).
Edit.
Formula for entropy : $H(X) = -\sum_1^n p(x) log_2(p(x))$.
If $x$ is the state of the coffee can, then : $$ \begin{array}{cc} p(x) = \{ & \begin{array}{cc} 1/24 & \text{can is light} \\ 1/24 & \text{can is heavy} \\ 11/12 & \text{can is normal} \end{array} \end{array} $$
So $$ H(X) = -\sum_1^n p(x) log_2(p(x)) = -1 \cdot (\frac{1}{24} \cdot log_2(\frac{1}{24}) + \frac{1}{24} \cdot log_2(\frac{1}{24}) + \frac{11}{12} \cdot log_2(\frac{11}{12})) = 0.4971502 $$
$12$ coffee cans should mean the total entropy is $12 * 0.497 = 5.965$
As you correctly reminded, the definition of entropy for a discrete random variable ${\displaystyle X}$, whose possible outcomes ${\displaystyle x_{1},...,x_{n}}$ occur with probability ${\displaystyle P(x_{1}),..., P(x_{n})}$, is
$${\displaystyle \mathrm {H} (X)=-\sum _{i=1}^{n}{\mathrm {P} (x_{i})\log {P} (x_{i})}}$$
Different definitions of entropy can be obtained by varying the base of the logarithm. One of the most commonly used is that in base $2$, which gives the resulting entropy in bits or "shannons". Note that the term $-\log P(x_i)$ represents the so-called Shannon self-information - a measure that attempts to quantify the level of "surprise" of a particular outcome, ranging from $0$ for an outcome that occurs with probability $1$ to $\infty$ for an outcome that occurs with probability $0$. In this view, entropy can be interpreted as the expectation of the self-information of a variable.
An important issue is that the summation used to determine the entropy has to be calculated over all possible outcomes. For scenarios where $>1$ elements are included (e.g., coins, bins, dice, and so on), it is necessary to consider simultaneously all elements - and not only one or part of them - to identify all possible outcomes.
In the question described by the OP, we have $12$ coffee cans: all are filled with the same quantity of coffee except one, which can contain either more or less coffee than the others. Looking simultaneously at all elements of the problem - i.e. at all $12$ cans - we have $12$ possible cases for the "different" can, and for each of these cases the difference can be a larger or a smaller coffee quantity. This directly leads to $24$ cases. Each of these cases is equally likely, so that the probability is $1/24$ for all. Applying the formula we get
$$\displaystyle \mathrm {H}=-\sum _{i=1}^{24}\, \frac{1}{24}\, \log_2 \left(\frac{1}{24}\right)= -\log_2 \left(\frac{1}{24}\right)= \log_2 (24)=\log_2 (8)+\log_2 (3)\\ =3+\log_2 (3)$$
The problem in your calculations giving $H=0.4971502$ is that you focused on the three possibilities of a single can (normal, light, heavy) instead of considering all possibilities for the whole set of cans. As stated above, in this type of problems, it is fundamental to look simultaneously at all elements.