Dimensional analysis and differential entropy

113 Views Asked by At

Differential entropy is a form of entropy that refers to the calculation of continuous distributions. Despite the fact that differential entropy does not have the same properties as the (discrete) Shannon entropy, it is a formulation that is currently used in many scientific publications. Differential entropy is defined as follows: \begin{equation} \label{eq:diffentropy} h(x)=E\{\ell(x)\}=-\int_{S} f(x) \log_{b} f(x) \end{equation} provided the integral exists, where $S$ is the support set of the random variable characterized by the probability density function $f(x)$. The base of the logarithm b is normally equal to two, but when entropy is measured in nats, its value is $e$. Thinking about the dimensions of entropy, I noticed an apparent inconsistency. According to dimensional analysis and the principle of homogeneity, the argument of the logarithm must be dimensionless. If this is the case, when expectation is calculated using the integral, the product $f(x) dx$ is not dimensionless and more precisely has the size of the variable $x$. On the other hand, if we assume that $f(x) dx$ is dimensionless, the logarithm will have an argument that is not dimensionless. Something is wrong in both cases. One possible solution is to imagine that the differential entropy is obtained from the relative entropy (the Kullback-Leibler divergence) \begin{equation} D_{KL}[f||g]=\int_{S} f(x) \log_{b} \frac{f(x)}{g(x)}dx \end{equation} using $g(x)=1$. However, this approach is completely wrong, because if $g(x)=1$, $g(x)$ cannot be a probability density function. In fact, once a probability density function is integrated, it must be equal to one.The only other alternative, which strikes me as rather odd though, is that the principle of homogeneity does not apply to differential entropy, but I honestly do not see why. If anyone has any ideas on how to resolve this apparent contradiction, I'd be grateful.

3

There are 3 best solutions below

4
On BEST ANSWER

If $h[X]$ denotes the entropy of the random variable $X$ and $Y = cX$, then $$ h[Y] = h[X] + \log c. $$ Also, if $Z$ is a Gaussian with variance $\sigma^2$, $\sigma$ has the same units as $Z$, but $$ h[Z] = \log(\sigma\sqrt{2\pi e}). $$ Clearly, something weird is going on. When I first looked at this, I was able to figure out what happens by looking at Rényi entropy.

What's going on here is that the Shannon entropy power $$ H[X] = e^{h[X]} $$ satisfies $$ H[cX] = cH[X] $$ and therefore $H[X]$ has the same units as $X$. So $h[X]$ is not unitless but behaves like the log of something with units.

Another aspect of dimensional analysis is asking how the Shannon entropy scales with respect to the dimension of the random variable. This is an ill-defined question, but we can ask how the Shannon entropy of an $n$-dimensonal Gaussian depends on $n$. The answer is that it is independent of $n$. In that sense, Shannon entropy is dimensionless.

2
On

Firstly, let's recall that $f(x)$ is a probability density fubction, therefore it possesses the inverse units with respect to $x$, so that the quantity $f(x)\mathrm{d}x$ is always dimensionless and can be interpreted as a probabality after integration.

Secondly, note that the variable $x$ itself may be dimensionless; if it is indeed the case, everything is dimensionless and perfectly fine.

Now, let's assume that $x$ is not dimensionless. Then, you are right when saying that the argument inside the logarithm is not dimensionless. What to do in that case ? You can adopt two points of view, which are actually equivalent at the end.

We can introduce a normalization constant $c$ in order to "kill" the units of $f$ inside the logarithm, i.e. $\log f \rightarrow \log \frac{f}{c}$, hence $\tilde{h}(x) = h(x) + \log c$, which translates the "zero level" entropy in the end. In the same spirit, you can renormalize the function $f$ itself, i.e. $\tilde{f} = \frac{f}{c}$, hence $\tilde{h}(x) = \frac{h(x) + \log c}{c}$.

You may also choose to renormalize the variable $x$ by a change of variable, i.e $x = \tilde{x}/a$ with $\tilde{x}$ dimensionless, such that $h(x) = ah(\tilde{x})$ or the entropy functional itself, so that $\tilde{h}(x) \propto h(x)$ hasn't got units anymore.

And if you are not comfortable with a "non-dimensionless" differential entropy after these renormalizations at the end, you are free to renormalize the entropy itself, i.e. $h \rightarrow \frac{h}{\alpha}$ to make it dimensionless.

All the considered transformations are affine (with respect to $h$) and don't affect the description of the system represented by the distribution $f$, because it won't modify the extremal points of the functional and, indirectly, the "zero level" of entropy can be chosen freely. This independence with respect to affine transformations comes the fact that the entropy is meant to be differentiated, since all physical quantities are recovered through its derivatives.


On a side note. The notion of differential entropy comes from statistical physics $-$ it is basically a microscopic analog of the more common thermodynamical entropy, hence its name $-$ and is given by $$ S[p] = -k_B \int p(x) \ln p(x) \,\mathrm{d}x, $$ where $k_B$ is the Boltzmann constant, which carries the units of entropy. In the same spirit, you can always re-introduce such a constant by renormalization, as said above. It is also to be noted that the aforementioned properties of entropy under affine transformations are related to its maximization, which is itself associated to the minimization of energy.

Even if this link to physics may seem to be a coincidence, it is to be noted that the Shannon and differential entropies are closely related to the same notion in thermodynamics, since they are responsible for the heat loss in circuits for instance, that is why these fields share the same vocabulary beyond the mathematical analogy.

0
On

The discrete entropy is $ - \sum P ( x ) \log P ( x ) $. Replacing $ P ( x ) $ with $ f ( x ) \, \mathrm d x $, the continuous analogue is $ - \int f ( x ) \log \bigl ( f ( x ) \, \mathrm d x \bigr ) \, \mathrm d x $. Although this is a strange expression, it can be defined as a limit of Riemann sums of the form $ - \sum f ( x ) \log \bigl ( f ( x ) \, \Delta x \bigr ) \, \Delta x $. Unfortunately, as long as $ f $ is an actual function (and not something like a delta distribution), this limit is infinite! (In other words, the entropy of any continuous probability distribution is infinite.) So it's not a useful quantity.

So let's play around with it a bit. We can break up $ \log \bigl ( f ( x ) \, \mathrm d x ) $ as $ \log f ( x ) + \log \mathrm d x $, so the entire integral breaks up into two terms: $$ - \int f ( x ) \log f ( x ) \, \mathrm d x - \int f ( x ) \log \mathrm d x \, \mathrm d x \text . $$ Here, the second term is infinite, but the first term is (at least for some distributions, including many common ones) finite. So as long as we remember that the first term is not the entropy (giving it a different name, such as differential entropy), then we might be able to do something interesting with it.

And indeed, differential entropy is useful for some things (such as bounding the error of an estimator), but it doesn't have all of the properties of entropy. In particular, it's not invariant under a change of variables, so it's not really a property of the system under consideration, but of how that system is being measured. It can be zero (even when the system is not deterministic), or even negative! So just remember that it's not the entropy, but a replacement with fewer useful properties.

As far as dimensionalysis, since $ f ( x ) \, \mathrm d x $ must be dimensionless (its integral is $ 1 $), this means that $ f ( x ) $ has the dimension of $ 1 / \mathrm d x $, and so the entire expression for differential entropy has the dimension of $ - \log ( 1 / \mathrm d x ) = \log \mathrm d x $. If this is a logarithm base $ b $, then $ b ^ { \text {differential entropy} } $ has the same dimension as $ \mathrm d x $. (Most of the time, you can treat this as the same dimension as $ x $.) If $ x $ itself is dimensionless, then so is the differential entropy, but not otherwise.