Shannon Entropy Question in C# (but mainly an algebra question)

165 Views Asked by At

I'm looking at someone else's project (in c#) and there's an entropy calculation I don't understand. I haven't done any maths in ages and I've had to look up entropy and Log in pursuit of this, so I'm happy to have gotten this far but now I'm stuck.

The main body of the entropy calculation is as follows:

private float calculateEntropy() {
    float total = 0;
    float entropySum = 0;
    foreach (var module in this) {
        total += module.Prototype.Probability;
        entropySum += module.PLogP; 
    }
    return -1f / total * entropySum + Mathf.Log(total);
}

Further Information

The project contains a list of "modules", and each module has a different "probability" of attached to it. The "calculateEntropy" function calculates the entropy of these probabilities. There can be between 1 and 500 modules in the list, and each module can have a "Probability" ranging from 0.05 to 1. By the time we get to the equation at the bottom, "total" is the sum of these probabilities, and "entropySum" is the sum of PLogP for all modules. For avoidance of doubt, PLogP is defined elsewhere in the project as "Probability*(Ln(Probability))", or to give the actual code:

this.PLogP = this.Prototype.Probability * Mathf.Log(this.Prototype.Probability);

My question is about the last line of the calculateEntropy function.

As far as I can tell, this can be re-arranged slightly and expressed in the following way:

-Σ(P*Ln(P)) / Σ(P) + Ln(Σ(P))

The first part of this seems to be Shannon's entropy formula, as I've seen it elsewhere. The next bit (Σ(P)) is presumably needed because our probabilities don't equal 1, so we have to divide by whatever they actually equal. However, I don't know why we need the last expression on the end (Ln(Σ(P)). I think my problem is that I just don't know how to re-arrange the formula, but can anyone help?

I think I've included all the relevant info, but let me know if something's missing.

2

There are 2 best solutions below

3
On BEST ANSWER

tl;dr: Your original code's author does not normalize their probability distributions to sum to $1$.

Given a sequence of nonnegative numbers $\{x_j\}_j$, we can define a probability distribution via $$\mathbb{P}[j]=\frac{x_j}{\sum_k{x_k}}$$ This has the nice property that it preserves relative ratios: $$\frac{x_j}{x_k}=\frac{\mathbb{P}[j]}{\mathbb{P}[k]}$$

People commonly work with such sequences, and not actual probability distributions, for two reasons:

  1. Often it's easier to compute relative ratios for a real-world system than actual probabilities, so people choose one event $E$, set $x_E=1$, and then compute each other event weighted with $$x_j=\frac{x_j}{1}=\frac{x_j}{x_E}$$
  2. If you compute a probability distribution using floating-point arithmetic, it is possible (probable, even) that your distribution will not sum to exactly $1$. Renormalizing fixes this.

Suppose your module.PLogP and module.Probability describe these $\{x_j\}_j$ instead of true probabilities. Then the formula you give is \begin{align*} -\frac{\sum_j{x_j\log{(x_j)}}}{\sum_j{x_j}}+\log{\left(\sum_k{x_k}\right)}&=-\frac{\sum_j{x_j\log{(x_j)}}-\sum_j{x_j\log{\left(\sum_k{x_k}\right)}}}{\sum_j{x_j}} \\ &=-\frac{\sum_j{x_j\left(\log{(x_j)}-\log{\left(\sum_k{x_k}\right)}\right)}}{\sum_k{x_k}} \\ &=-\sum_j{\frac{x_j}{\sum_k{x_k}}\left(\log{(x_j)}-\log{\left(\sum_k{x_k}\right)}\right)} \\ &=-\sum_j{\frac{x_j}{\sum_k{x_k}}\log{\left(\frac{x_j}{\sum_k{x_k}}\right)}} \\ &=-\sum_j{\mathbb{P}[j]\log{(\mathbb{P}[j])}} \end{align*}

9
On

This is merely a different rendering of the other answer. To connect it even closer to the OP's notation we have $$ \begin{align} -\sum p\log p &= -\sum\frac{\texttt{P}}{\texttt{total}}\log\left(\frac{\texttt{P}} {\texttt{total}}\right)\\ &= -\sum\frac{\texttt{P}}{\texttt{total}}(\log\texttt P - \log\texttt{total})\\ &= -\frac{\sum\texttt{PlogP}}{\texttt{total}}+\log\texttt{total} = -\frac{\texttt{entropySum}}{\texttt{total}}+\log\texttt{total} \end{align} $$ where the simplification of the last term comes from the fact that $\texttt{total}$ is the normalizing factor: $$ \sum\frac{\texttt{P}}{\texttt{total}}=1\quad\text{since}\quad\texttt{total}=\sum\texttt P $$ and the fact that all involved sums iterate over all values of $p$ or $\texttt P$ whereas $\texttt{total}$ is constant w.r.t. the sums.