Let $\Sigma$ be a source alphabet with a probability distribution over its symbols $P$. Then, the Shannon entropy of $\Sigma$ is $$-\sum p_j \times \mbox{log}_2(p_j)$$ where $p_j$ is the probability of the j'th symbol. We know, thanks to the Shannon's source coding theorem, that this entropy is a (very close) lower bound of the minimum weighted average codeword length, i.e. the weighted average length of the codewords obtained with the Huffman algorithm.
The question: Let $P_1$ and $P_2$ be different probability distributions of the symbols of a given alphabet such that the shannon entropy of $P_1$ is smaller than the shannon entropy of $P_2$. Does this imply that the mininmum weighted average codeword length that we will be able to find for $P_1$ will be at most as large as the minimum weighted average codeword length that we will be able to find for $P_2$?
I am looking for either a counter-example of this or a proof that this is always the case.
Note: this is indeed true when the shannon entropy of $P_1$ is $\leq$ than the shannon entropy of $P_2 + 1$, as the Shannon's source coding theorem tells us that the entropy + 1 is an upper bound of the minimum weighted average codeword length. But when this is not the case (i.e. the difference is less than 1), it is not clear.
Here is a counter example: $P_1 = [0.5, 0.49, 0.01]$, $P_2 = [0.6, 0.35, 0.05]$.
$Entropy(P1) \simeq 1.0707202712709103$
$Entropy(P2) \simeq 1.1883763717345073$
But as you can see, the expected length for Huffman for $P_1$ is 1.5 while for $P_2$ it is 1.4 since they both have the same optimal code of $[0, 10, 11]$.
My opinion is the optimality of the Huffman code is more about how well the probabilities or their combinations can be estimated by $2^{-k}$, while the entropy measures the unexpectedness of variables. For my example, the 0.01 messes things up for P1 while P2 has a better distribution in total.