I have that entropy is
$$ H(X) = - \sum_i^m p_i \log_2 (p_i) $$
And my understanding is that for a fair coin toss entropy would be maximised as there's the greatest amount of uncertainty for a probability of $0.5$
I have three variables and would like to choose one to use as a node in a decision tree, for which I should choose the one which reduces entropy the most.
Each of the variables $x_1, x_2, x_3$ is a binary variable, and the proportions of successes for each are
$$ \bar{x_1} = 0.12 \\ \bar{x_2} = 0.04 \\ \bar{x_3} = 0.78 \\ $$
With this in mind the entropy is initially
$$ -( (0.12) \log_2 ( 0.12 ) + (0.04) \log_2 ( 0.04 ) + (0.78) \log_2 ( 0.78 ) )\\= 0.37 + 0.19 + 0.28\\ = 0.83 $$
The largest term in the above sum is that corresponding to $x_1$ meaning I should choose this as the node.
What I'm unsure about is that I expected the choice to correspond with the proportion which is closest to 0.5. This doesn't seem to be the case though.
Can someone explain why this is, or where I've misunderstood in the above?
Indeed, graph of binary entropy is symmetric around 0.5 with maximum at 0.5. The entropy of Bernoulli(p) is $H = -p\log p - (1-p)\log(1-p).$ However, your formula represents a different function, which is not symmetric and has maximum at $1/e$. Here is the graph.