The formula for Shannon entropy is as follows,
$$\text{Entropy}(S) = - \sum_i p_i \log_2 p_i $$
Thus, a fair six sided dice should have the entropy,
$$- \sum_{i=1}^6 \dfrac{1}{6} \log_2 \dfrac{1}{6} = \log_2 (6) = 2.5849...$$
However, the entropy should also correspond to the average number of questions you have to ask in order to know the outcome (as exampled in this guide under the headline Information Theory).
Now, trying to construct a decision tree to describe the average number of questions we have to ask in order to know the outcome of a dice, and this seems to be the optimal one:

Looking at the average number of questions in the image, there are 3 questions in 4/6 cases in 2 questions in 2/6 cases. Thus the entropy should be:
$$\dfrac{4}{6} \times 3 + \dfrac{2}{6} \times 2 = 2.6666...$$
So, obviously the result for the entropy isn't the same in the two calculations. How come?
In your setting, the Shannon entropy is "just" a lower bound for an entropy of any decision tree (including optimal ones). These don't have to coincide. To get closer to what the Shannon entropy is, imagine an optimal decision tree identifying outcomes of throwing a dice $N$ times with some large $N$ (assuming independence). The larger $N$ is, the smaller (yet nonnegative) is the difference between the "averaged" (i.e. divided by $N$) entropy of this "compound" decision tree and the Shannon entropy of the dice. (It resembles a background of arithmetic coding).