The information content of an outcome of a variable is $h(x_i)=-\log_2 (p(x_i))$. I am interested in using this concept to provide more insight on the differences in the distributions of two independent variables.
I am aware of entropy and relative entropy for comparing the expected value of these variables, and I'm using these as needed. However, I am looking for a metric that captures the shape of the distributions that would be graphically captured by a histogram. Is there any reason that I can't take the sum of information content values for the various outcomes of a variable? Effectively, calculating: $$ h(X)=\sum(-log_2 (p(x_i))$$
These summed information content values give a measure of the shape of the distribution.
I've tried looking for examples of people using this metric, but have come up empty. I'm not sure what it would be called -- presumably just the information content of the variable. I can't see any mathematical reason why this can't be done, but I thought someone here might be able to fill me in if there's something I'm missing.
You'd need to justify why you believe that that value is useful, and why it gives
Anyway, if the alphabet size is $n$ we have
$$\begin{align} \sum_x -\log p(x) &= \sum_x \log (1/p(x))\\ &= \sum_x (\log \frac{1/n}{p(x)} + \log(n)) \\ &= n \log n + n\sum_x \frac{1}{n}\log \frac{1/n}{p(x)} \\ &= n \left(\log n + KL(u || p(x)\right) \end{align} $$
hence the value depends directly on the KL divergence (relative entropy) between an uniform and the given distribution.
Furthermore, for a nice finite entropy distribution with infinite values, like the geometric ($p(x_i) = 2^{-i}$), the value is infinite - which is not very nice.