In the entropy function here
$H(s) = -\sum P(class=i|S)log_2{P(class=i|S)}$
I am trying to understand what is the domain of it's output for any input. I know that given a set where the frequency of 1 unique item is 100% and the frequency of every other unique item is 0%, $H(S)=0$.
But what if the given set, was such that every unique item was equally frequent, how can you know what the result of $H(S)$ is right away without computing it manually. Is there another formula or relation to get this?
I made a quick python script to test different fully uniform distribution sets to see what the output is (see below), but I couldn't find any relationship of the output result to any input variables.
Does anyone know about this?
Thanks
import math
for c in range(1, 101):
a = 10.0
b = a * c
s = 0
for i in range(c):
s += (a/b) * math.log(a/b, 2)
s = -s
print s
output
-0.0
1.0
1.58496250072
2.0
2.32192809489
2.58496250072
2.80735492206
3.0
3.16992500144
3.32192809489
3.45943161864
3.58496250072
3.70043971814
3.80735492206
3.90689059561
4.0
4.08746284125
4.16992500144
4.24792751344
4.32192809489
4.39231742278
4.45943161864
4.52356195606
4.58496250072
4.64385618977
4.70043971814
4.75488750216
4.80735492206
4.85798099513
4.90689059561
4.95419631039
5.0
5.04439411936
5.08746284125
5.12928301694
5.16992500144
5.20945336563
5.24792751344
5.28540221886
5.32192809489
5.35755200462
5.39231742278
5.4262647547
5.45943161864
5.49185309633
5.52356195606
5.55458885168
5.58496250072
5.61470984412
5.64385618977
5.67242534197
5.70043971814
5.72792045456
5.75488750216
5.78135971352
5.80735492206
5.83289001416
5.85798099513
5.88264304936
5.90689059561
5.93073733756
5.95419631039
5.9772799235
6.0
6.02236781303
6.04439411936
6.06608919046
6.08746284125
6.10852445678
6.12928301694
6.1497471195
6.16992500144
6.18982455888
6.20945336563
6.2288186905
6.24792751344
6.26678654069
6.28540221886
6.30378074818
6.32192809489
6.33985000288
6.35755200462
6.37503943135
6.39231742278
6.40939093614
6.4262647547
6.44294349585
6.45943161864
6.47573343097
6.49185309633
6.5077946402
6.52356195606
6.53915881111
6.55458885168
6.56985560833
6.58496250072
6.59991284219
6.61470984412
6.62935662008
6.64385618977
If there are $n$ unique items in the set $S$ each with equal (conditional) probability of occurrence then: $$\begin{align} \mathsf H(S) & = - \sum_{i=1}^n \tfrac 1 n \log_2(\tfrac 1 n) \\[2ex] & = \log_2 n \end{align}$$