Priors in Shannon and Rényi entropies

182 Views Asked by At

[Note: Cross-posted at Cross Validated StackExchange] I am new to information theory and currently working with Shannon and Rényi entropies. Given a random variable $x$ and its pdf $p_{\theta}(x)$ parameterized by variable $\theta$, we write Shannon and Rényi entropies as $$H(p) = -\sum\limits_{x} p_{\theta}(x) \log p_{\theta}(x)$$ and $$H_{\alpha}(p) = \frac{1}{1-\alpha}\log\Bigg(\sum\limits_{x} p_{\theta}(x)^\alpha\Bigg),$$ respectively. Through the application of L'Hopital rule, it can be shown that as $\alpha\rightarrow1$, $H_{\alpha}(p)$ approaches $H(p)$.

In case the pdf $\lambda(\theta)$ of $\theta$ is also known, the Shannon entropy is obtained by replacing $p_{\theta}(x)$ by $\lambda({\theta})p_{\theta}(x)$ in its expression, i.e., $$H(p;\lambda) = -\sum\limits_{x} \lambda({\theta})p_{\theta}(x) \log (\lambda({\theta})p_{\theta}(x)).$$

I am not sure how to obtain Rényi entropy in this case. If I replace $p_{\theta}(x)$ by $\lambda({\theta})p_{\theta}(x)$ in the expression of the Rényi entropy, then I get $$H_{\alpha}(p;\lambda) = \frac{1}{1-\alpha}\log\Bigg(\sum\limits_{x} (\lambda(\theta)p_{\theta}(x))^\alpha\Bigg).$$ However, after application of L'Hopital's rule, this expression does not reduce to $H(p;\lambda)$ as $\alpha\rightarrow1$.

Could someone point out what I might be missing here? Any help will be greatly appreciated.

-RD

2

There are 2 best solutions below

3
On

Note that the summation is over $x$ and not $\theta$.

If we proceed with L'Hopital's rule, the derivative of the numerator is $$\frac{d}{d\alpha} \left[ \log \sum_x (\lambda(\theta)p_\theta(x))^\alpha \right] = \frac{\frac{d}{d\alpha} \left[ \sum_x (\lambda(\theta) p_\theta(x))^\alpha \right]}{\sum_x (\lambda(\theta) p_\theta(x))^\alpha} = \frac{\sum_x \log (\lambda(\theta) p_\theta(x)) (\lambda(\theta) p_\theta(x))^\alpha}{\lambda(\theta)^\alpha \sum_x p_\theta(x)^\alpha} .$$

As $\alpha \to 1$, the numerator becomes $-H(p;\lambda)$ and the denominator becomes $\lambda(\theta)$. So this suggests that all we need to do to correct for the extra factor is to simply write $$H_\alpha(p;\lambda) = \frac{\lambda(\theta)}{1-\alpha} \log \sum_x (\lambda(\theta)p_\theta(x))^\alpha.$$

1
On

The formula you write for the Shannon entropy in the case when $\theta$ is a random variable is not correct. Note that, when $\theta$ is a random variable, you are essentially dealing with an "augmented" random variable of the form $(x, \theta)$ i.e., a tuple, containing the original random variable and the (random) parameter. The distribution of this new random variable is, of course,

$$ p_{(x,\theta)}(x,\theta)=p_\theta(\theta) p_{x|\theta}(x|\theta), $$ for all valid $(x,\theta)$. In turn, the entropy becomes (assuming a discrete-valued $\theta$)

$$ \begin{align} H(p_{(x,\theta)})&=-\sum_x \sum_\theta p_{(x,\theta)}(x,\theta)\log p_{(x,\theta)}(x,\theta)\\ &=-\sum_x \sum_\theta p_\theta(\theta) p_{x|\theta}(x|\theta)\log \left(p_\theta(\theta) p_{x|\theta}(x|\theta)\right) \end{align} $$ (Note that the only difference with your formula is that you omit the summation over the values of $\theta$.)

The Renyi entropy for $(x,\theta)$ is defined similarly and its equivalence to the Shannon entropy for $\alpha\rightarrow 1$ holds.