Definition of Entropy for Homogeneous sample.

127 Views Asked by At

In the context of decision tree's we learned that Entropy of a Sample is defined as

$H(<P_1...P_n>) = \sum_{i=1}^{n}P_i log_2(\frac{1}{P_i})$

Where $P_i$ is the proportion of the Variable i of the total Sample.

Now to me it's clear that the Entropy converges against zero as our sample becomes Homogeneous. However for a homogeneous sample the formula as above isn't defined because of the division by zero.

Is there a better definition that resolves this issue or is the Entropy for this kind of sample simply undefined or "zero be definition"?

1

There are 1 best solutions below

0
On

From what I could find it seems the answer lies in that the "exact" definition is to replace the function with it's positive limit.

So instead of using

$P_i*log_2(\frac{1}{P_i})$

we use the positive limit

$\lim_{x->P_i}(x*log_2(\frac{1}{x}))$

which can then be solved with l'hopital.