Entropy and its relation to proportion

247 Views Asked by At

I have that entropy is

$$ H(X) = - \sum_i^m p_i \log_2 (p_i) $$

And my understanding is that for a fair coin toss entropy would be maximised as there's the greatest amount of uncertainty for a probability of $0.5$

I have three variables and would like to choose one to use as a node in a decision tree, for which I should choose the one which reduces entropy the most.

Each of the variables $x_1, x_2, x_3$ is a binary variable, and the proportions of successes for each are

$$ \bar{x_1} = 0.12 \\ \bar{x_2} = 0.04 \\ \bar{x_3} = 0.78 \\ $$

With this in mind the entropy is initially

$$ -( (0.12) \log_2 ( 0.12 ) + (0.04) \log_2 ( 0.04 ) + (0.78) \log_2 ( 0.78 ) )\\= 0.37 + 0.19 + 0.28\\ = 0.83 $$

The largest term in the above sum is that corresponding to $x_1$ meaning I should choose this as the node.

What I'm unsure about is that I expected the choice to correspond with the proportion which is closest to 0.5. This doesn't seem to be the case though.

Can someone explain why this is, or where I've misunderstood in the above?

2

There are 2 best solutions below

5
On

Indeed, graph of binary entropy is symmetric around 0.5 with maximum at 0.5. The entropy of Bernoulli(p) is $H = -p\log p - (1-p)\log(1-p).$ However, your formula represents a different function, which is not symmetric and has maximum at $1/e$. Here is the graph.

enter image description here

0
On

With this in mind the entropy is initially

It's not clear for me why are you computing the entropy in that way. The entropy is always the entropy of a random variable, with probability function $p(x)$. Which is the random variable here?

That would be correct if the three numbers $(0.12,0.04,0.78)$ would correspond to the probabilities of taking one of three values , say the probabilities of taking values $1,2,3$. But that cannot be the interpretation of those numbers, because the don't sum up to one. They are instead the parameters of three different Bernoulli (binary) random variables. Hence your entropy computation doesn't doesn't make sense.

What would make sense is to compute the entropy of each (binary) random variable $x_1,x_2,x_3$ separatedly:

$$ H(x_1) = - 0.12 \log_2 ( 0.12 ) - (1- 0.12) \log_2 (1-0.12) $$ $$ H(x_2) = - 0.04 \log_2 ( 0.04 ) - (1- 0.04 ) \log_2 (1-0.04 ) $$ $$ H(x_3) = - 0.78 \log_2 ( 0.78) - (1- 0.78 ) \log_2 (1-0.78 ) $$

and then the third one would be the greatest entropy, because its parameter is closer to $1/2$ than the other ones.