Statistical Inference, Differential Geometry and Entropy

159 Views Asked by At

Context: Statistical Inference and Differential Geometry

Let's consider a generic $ p(x;\theta) $ distribution with $ \theta $ Parameters Vector, it is obvious that $$ \int p(x; \theta) dx = 1 $$

and by definition it should be exactly 1 for any value of $ \theta $ hence $$ \frac{\partial}{\partial \theta_{i}} \int p(x; \theta) = 0 $$

Then now let's consider the negative entropy of the distribution (the expected value of the log distribution) $$ S = E_{\theta}\left [ \ln(p(x; \theta)) \right ] = \int \ln(p(x;\theta)) p(x; \theta) dx $$

applying the derivative internally we should ge $$ E_{\theta}\left [ \frac{\partial}{\partial \theta_{i}} \ln(p(x; \theta)) \right ] = \int \frac{\partial}{\partial \theta_{i}} \ln(p(x; \theta)) p(x; \theta) dx = \int \frac{\partial}{\partial \theta_{i}} p(x; \theta) \quad \forall i $$

Now if it would be possible to move freely the derivative in and out the of the integral it will be possible to see that $$ E_{\theta}\left [ \frac{\partial}{\partial \theta_{i}} \ln(p(x; \theta)) \right ] = \frac{\partial}{\partial \theta_{i}} E_{\theta} \left [ \ln(p(x; \theta)) \right ] $$

hence the (negative) entropy seems constant with respect to the parameters but this does not seem correct to me: let's consider for example the gaussian distribution and it is easy to observe that its entropy depends on the variance.

Maybe the problem is that the above mentioned parametrization is different or does it depend on the possibility to move the derivative freely in and out the integral of the expected value ?

3

There are 3 best solutions below

11
On BEST ANSWER

The error is here:

$$ E_{\theta}\left [ \frac{\partial}{\partial \theta_{i}} \ln(p(x; \theta)) \right ] = \frac{\partial}{\partial \theta_{i}} E_{\theta} \left [ \ln(p(x; \theta)) \right ] $$

This is incorrect as the density depends on $\theta$: when writing the derivative there is one additional term $$ E_\theta\left[\ln p(x;\theta) \frac{\partial p(x;\theta)}{\partial\theta}\right] $$

0
On

As far as I can see the problem is that $ \partial_{\theta_{i}} E_{\theta}[\ln(p)] \neq E_{\theta}[\partial_{\theta_{i}} \ln(p)] $

You wrote $ E_{\theta}[\partial_{\theta_{i}} \ln(p)] $ correctly while the other term is $$ \partial_{\theta_{i}} E_{\theta}[\ln(p)] = \int \left ( \partial_{\theta_{i}} \ln(p(x;\theta)) \right ) p(x;\theta) dx + \int \ln(p(x;\theta)) \left ( \partial_{\theta_{i}} p(x; \theta) \right ) dx $$

hence $$ \partial_{\theta_{i}} E_{\theta}[\ln(p)] = \int \partial_{\theta_{i}}p(x;\theta)dx + \int \ln(p(x;\theta)) \left ( \partial_{\theta_{i}} p(x; \theta) \right ) dx $$

Regarding the first term, the integration is over $ x $ and the partial derivation is overo $ \theta_{i} $ hence it should be possible to write $$ \int \partial_{\theta_{i}}p(x;\theta)dx = \partial_{\theta_{i}} \int p(x; \theta) dx = 0 $$

as you have already demonstrated above so it follows that $$ \partial_{\theta_{i}} E_{\theta}[\ln(p)] = \int \ln(p(x;\theta)) \left ( \partial_{\theta_{i}} p(x; \theta) \right ) dx $$

In case $ p(x; \theta) \sim \exp \left ( f(x; \theta) \right ) $ the distribution is part of the exponential family the above term becomes $$ \partial_{\theta_{i}} E_{\theta}[\ln(p)] = \int f(x; \theta) \partial_{\theta_{i}}f(x; \theta) p(x; \theta) dx $$

then $$ \partial_{\theta_{i}} E_{\theta}[\ln(p)] = E_{\theta}[ f(x; \theta) ( \partial_{\theta_{i}}f(x; \theta)) ] $$

hence the variation of entropy of the distribution is not independent of the parameters variation

0
On

I believe it more clear to restate Nicola Bernini's conclusion by writing:

Hence, the variation of ensemble-average entropy of the distribution is dependent on the parameters of the distribution.

I believe this to be so for two reasons: first, the above distinguishes between the entropy $\ln(p)$ and the ensemble-average entropy $E_\theta[\ln(p)]$; and second it excludes an unnecessary double negative (i.e., 'not independent').