Methods to linearize information entropy?

159 Views Asked by Bumbble Comm At 30 Mar 2026 - 3:09

In information theory the information entropy of a vector of probabilities $\bf p$ can be defined as

$$I_b({\bf p}) = -\sum_{\forall i} p_i \log_b(p_i) = -{\bf p}^T \log_b({\bf p})$$

Where $1^T{\bf p} = 1$ is the probability sums to 100% condition which all probabilities must obey together with positivity $p_i \geq 0, \forall i$ .

Now to the question how would it be possible to make this quantity linear in the sense that we can express it with linear algebra and solve for it using ordinary least squares as we would if it were a linear equation system or minimization?

Original Q&A

There are 1 best solutions below

Bumbble Comm On 20 May 2017 - 1:09 BEST ANSWER

$p \log p$ isn't nice around $p=0$ due to the behavior of $\log p$ -- to see some issues that could occur, for example, in estimating entropy from data, see arXiv:1407.0381 [cs.IT]. The paper outlines some polynomial approximation techniques for $-p \log p$ in section 4. If you knew all your elements of $p$ had to be far away from $0$, you could do something like a Taylor series about a probability vector with full support (i.e. no zero elements).

That being said, entropy is concave, so maximizing entropy over a convex subset of probability distributions isn't hard. This functionality is built into packages like CVX.

Minimizing a concave function isn't straightforward, but you do have some structural properties of entropy (e.g. looking more like a point mass helps reduce entropy). But you're maximizing entropy in a lot more cases than minimizing it.

Methods to linearize information entropy?

There are 1 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in PROBABILITY

Related Questions in SOFT-QUESTION

Related Questions in INFORMATION-THEORY

Related Questions in ENTROPY

Trending Questions

Popular # Hahtags

Popular Questions