Does the Information Gain algorithm favor a high-entropy attribute or a low-entropy one?

164 Views Asked by At

This might not be mutual to mathematics but it does relate to Information-Theory.

My question is:

Does the InformationGain algorithm, in Decision-Tree machine-learning, favor a high-entropy attribute or a low-entropy one?

The source of my confusion is in the definition of Shannon's Function:

                           H = -SUM(pi*log2(pi))
                               /\--this MINUS-right here!

If this is the case then SURELY: gain = Hbefore - Hafter

Actually, means:

gain = Hbefore + Hafter 

??... No, then have people just forgotten about the MINUS-sign??

1

There are 1 best solutions below

0
On BEST ANSWER

The minus sign is NOT a subtraction. It is negative. The reason a negative sign is there is because we are taking logarithm of probabilities.

Do it in your calculator, what is the base 2 logarithm of 0.5? That's right, it is -1. In order to make the information of a random variable that take 50% chance 0 and 50% chance 1, we need to take the negative of the logarithm to make it work.