I'm looking at someone else's project (in c#) and there's an entropy calculation I don't understand. I haven't done any maths in ages and I've had to look up entropy and Log in pursuit of this, so I'm happy to have gotten this far but now I'm stuck.
The main body of the entropy calculation is as follows:
private float calculateEntropy() {
float total = 0;
float entropySum = 0;
foreach (var module in this) {
total += module.Prototype.Probability;
entropySum += module.PLogP;
}
return -1f / total * entropySum + Mathf.Log(total);
}
Further Information
The project contains a list of "modules", and each module has a different "probability" of attached to it. The "calculateEntropy" function calculates the entropy of these probabilities. There can be between 1 and 500 modules in the list, and each module can have a "Probability" ranging from 0.05 to 1. By the time we get to the equation at the bottom, "total" is the sum of these probabilities, and "entropySum" is the sum of PLogP for all modules. For avoidance of doubt, PLogP is defined elsewhere in the project as "Probability*(Ln(Probability))", or to give the actual code:
this.PLogP = this.Prototype.Probability * Mathf.Log(this.Prototype.Probability);
My question is about the last line of the calculateEntropy function.
As far as I can tell, this can be re-arranged slightly and expressed in the following way:
-Σ(P*Ln(P)) / Σ(P) + Ln(Σ(P))
The first part of this seems to be Shannon's entropy formula, as I've seen it elsewhere. The next bit (Σ(P)) is presumably needed because our probabilities don't equal 1, so we have to divide by whatever they actually equal. However, I don't know why we need the last expression on the end (Ln(Σ(P)). I think my problem is that I just don't know how to re-arrange the formula, but can anyone help?
I think I've included all the relevant info, but let me know if something's missing.
tl;dr: Your original code's author does not normalize their probability distributions to sum to $1$.
Given a sequence of nonnegative numbers $\{x_j\}_j$, we can define a probability distribution via $$\mathbb{P}[j]=\frac{x_j}{\sum_k{x_k}}$$ This has the nice property that it preserves relative ratios: $$\frac{x_j}{x_k}=\frac{\mathbb{P}[j]}{\mathbb{P}[k]}$$
People commonly work with such sequences, and not actual probability distributions, for two reasons:
Suppose your
module.PLogPandmodule.Probabilitydescribe these $\{x_j\}_j$ instead of true probabilities. Then the formula you give is \begin{align*} -\frac{\sum_j{x_j\log{(x_j)}}}{\sum_j{x_j}}+\log{\left(\sum_k{x_k}\right)}&=-\frac{\sum_j{x_j\log{(x_j)}}-\sum_j{x_j\log{\left(\sum_k{x_k}\right)}}}{\sum_j{x_j}} \\ &=-\frac{\sum_j{x_j\left(\log{(x_j)}-\log{\left(\sum_k{x_k}\right)}\right)}}{\sum_k{x_k}} \\ &=-\sum_j{\frac{x_j}{\sum_k{x_k}}\left(\log{(x_j)}-\log{\left(\sum_k{x_k}\right)}\right)} \\ &=-\sum_j{\frac{x_j}{\sum_k{x_k}}\log{\left(\frac{x_j}{\sum_k{x_k}}\right)}} \\ &=-\sum_j{\mathbb{P}[j]\log{(\mathbb{P}[j])}} \end{align*}