Proof of Shannon's entropy formula: Suppose that $p_1,...,p_n$, and $q_1,...,q_n$ are positive numbers...

528 Views Asked by At

Suppose that $p_1 ,...,p_n,$ and $q_1,...,q_n$ are positive numbers and $\sum_i p_i$ = 1 = $\sum_i q_i$.

Show that $\sum_{i=1}^m p_i \log(1/q_i)$$\sum_{i=1}^m p_i \log(1/p_i)$ with equality if and only if $p_i$ = $q_i$, i = 1, ..., n.


My Attempt (I think I made an error early on?):

\begin{align} \sum_{i=1}^m p_i \log(1/q_i) - \sum_{i=1}^m p_i \log(1/p_i) &= \sum_{i=1}^m p_i [ \log(1/q_i) - \log (1/p_i)] \\ &=\sum_{i=1}^m p_i \log(p_i/q_i) \\ &= \sum_{i=1}^m p_i \ln(p_i/q_i) \\ &≥ \sum_{i=1}^m (p_i/q_i -1) \\ &= \sum_{i=1}^m 1/q_i -\sum_{i=1}^m 1/p_i \\ &= 0 \end{align}

2

There are 2 best solutions below

0
On BEST ANSWER
  1. It looks like your big error is in the 2nd line. It should instead be the following: For each $i=1,2,\ldots, n$:

$p_i \ln (p_i/q_i) = -p_i \ln(q_i/p_i) \geq -p_i(q_i/p_i -1) = (p_i - q_i)$. Also, for your proof you need to observe that the equality $-p_i \ln(q_i/p_i) = -p_i(q_i/p_i -1)$ holds if and only if $q_i/p_i=1$ or equivalently, if and only if $p_i=q_i$. [and so of course $-p_i\ln(p_i/q_i) =$ 0].

  1. The step $\sum_{i=1}^n p_i \log(p_i/q_i)$ $=$ $\sum_{i=1}^n p_i \ln(p_i/q_i)$ [at the end of your top line of your proof] is technically incorrect, as $\log A \not = \ln A$ for any positive $A$. For example, does $\log 10 = \ln 10$?

However, what is correct is that $\sum_{i=1}^n p_i \log(p_i/q_i)$ is nonnegative iff $\sum_{i=1}^n p_i \ln(p_i/q_i)$ is nonnegative, which is what it looked like you were trying to show. To this end, you could say at the end of your top line of your proof that $\sum_{i=1}^n p_i \log(p_i/q_i)$ is positive/zero iff $\sum_{i=1}^n p_i \ln(p_i/q_i)$ is positive/zero, which is what you will next show.

See if this is enough for you to correct.

0
On

The quantity $\sum_{i=1}^m p_i \log(1/q_i) - \sum_{i=1}^m p_i \log(1/p_i)$ is called the relative entropy (or Kullback–Leibler divergence) of the probability distributions $p$ and $q$. It is usually denoted as $D(p | q)$. What you are looking to prove here is that $D(p||q) \geq 0$.

$D(p||q) = \sum _i p_i \log \frac{p_i}{q_i} = \sum _i p(x) \left(- \log \frac{q_i}{p_i} \right) = \mathbb{E}_{p_i} \left[ -\log \left( \frac{q_i}{p_i} \right) \right] $

The common way of proving this relies on the Jenson's inequality which states that if a function $f(x)$ is convex then

$\mathbb{E}_{p_i} \left[ f(x) \right] \geq f\left( \mathbb{E}_{p_i}[x] \right)$

with equality only if and only if $x$ is a constant. Since the function $f(x) = -\log(x)$ is convex what follows from this is that

\begin{align} D(p||q) &= \mathbb{E}_{p_i} \left[ -\log \left( \frac{q_i}{p_i} \right) \right] \\ &\geq -\log \left( \mathbb{E}_{p_i}\left[\frac{q_i}{p_i}\right] \right) \\ &= -\log \left( \sum _i p_i \frac{q_i}{p_i} \right) \\ &= -\log \left( \sum _i q_i \right) \\ &= -\log 1 \\ &= 0 \\ \implies D(p||q) \geq 0 \end{align}

From Jensons inequality, the inequality above will only equate if $p_i/q_i$ is a constant for all $i$. This implies that $p_i = k q_i$ for some constant $k$. What follows from this is

\begin{align} \sum_i p_i &= \sum_i k q_i \\ &= k \sum_i q_i \\ &= k \times 1 \\ \implies k = 1 \end{align}

It thus follows that we get equality only if and only if $p_i = q_i$ $\forall i$.